git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
@ 2022-01-05 13:23 Jessica Clarke
  2022-01-05 16:39 ` Konstantin Khomoutov
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jessica Clarke @ 2022-01-05 13:23 UTC (permalink / raw)
  To: git; +Cc: Jessica Clarke

On CHERI, and thus Arm's Morello prototype, pointers are implemented as
hardware capabilities which, as well as having a normal integer address,
have additional bounds, permissions and other metadata in a second word.
In order to preserve this metadata, uintptr_t is also implemented as a
capability, not a plain integer, which causes problems for binary
operators, as the metadata preserved in the output can only come from
one of the inputs. In most cases this is clear, as normally at least one
operand is provably a plain integer, but if both operands are uintptr_t
and have no indication they're just plain integers then it is ambiguous,
and the current implementation will arbitrarily, but deterministically,
pick the left-hand side, due to empirical evidence that it is more
likely to be correct.

In this instance, both operands are of type uintptr_t, with one being a
function argument and one being cast from a pointer, so both could be
valid pointers. Moreover, the left-hand side is not the actual pointer.
This means that, currently, the code when run on a CHERI architecture
will preserve the metadata from the integer, i.e. an invalid capability
that will trap on deference, and not the pointer.

This can be addressed by changing the type of the function argument in
order to more clearly convey intent, both to the compiler so it knows to
generate the right code but also to the developer so it's clear that the
argument is not in fact a pointer but just a plain integer (in this case
being either APPLY_SYMLINK_GOES_AWAY or APPLY_SYMLINK_IN_RESULT).

Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
---
 apply.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/apply.c b/apply.c
index fed195250b..7c7d56cacb 100644
--- a/apply.c
+++ b/apply.c
@@ -3814,7 +3814,7 @@ static int check_to_create(struct apply_state *state,
 
 static uintptr_t register_symlink_changes(struct apply_state *state,
 					  const char *path,
-					  uintptr_t what)
+					  size_t what)
 {
 	struct string_list_item *ent;
 
@@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state,
 		ent = string_list_insert(&state->symlink_changes, path);
 		ent->util = (void *)0;
 	}
-	ent->util = (void *)(what | ((uintptr_t)ent->util));
+	ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));
 	return (uintptr_t)ent->util;
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke
@ 2022-01-05 16:39 ` Konstantin Khomoutov
  2022-01-05 16:40   ` Jessica Clarke
  2022-01-06 22:50 ` Taylor Blau
  2022-01-06 22:53 ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: Konstantin Khomoutov @ 2022-01-05 16:39 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote:

[...]
> This means that, currently, the code when run on a CHERI architecture
> will preserve the metadata from the integer, i.e. an invalid capability
> that will trap on deference, and not the pointer.

^ You have probably meant to use "dereference" here.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-05 16:39 ` Konstantin Khomoutov
@ 2022-01-05 16:40   ` Jessica Clarke
  0 siblings, 0 replies; 14+ messages in thread
From: Jessica Clarke @ 2022-01-05 16:40 UTC (permalink / raw)
  To: Konstantin Khomoutov; +Cc: git

On 5 Jan 2022, at 16:39, Konstantin Khomoutov <kostix@bswap.ru> wrote:
> 
> On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote:
> 
> [...]
>> This means that, currently, the code when run on a CHERI architecture
>> will preserve the metadata from the integer, i.e. an invalid capability
>> that will trap on deference, and not the pointer.
> 
> ^ You have probably meant to use "dereference" here.

Indeed I did; spellcheck doesn’t help for wrong words...

Should I send a v2, or can that be fixed on git-am?

Jess


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke
  2022-01-05 16:39 ` Konstantin Khomoutov
@ 2022-01-06 22:50 ` Taylor Blau
  2022-01-06 22:57   ` Jessica Clarke
  2022-01-06 22:53 ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: Taylor Blau @ 2022-01-06 22:50 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote:
> [...] In most cases this is clear, as normally at least one operand is
> provably a plain integer, but if both operands are uintptr_t and have
> no indication they're just plain integers then it is ambiguous, and
> the current implementation will arbitrarily, but deterministically,
> pick the left-hand side, due to empirical evidence that it is more
> likely to be correct.

Wouldn't a simpler, less invasive fix be to instead write the expression
so that the left-hand operand is a pointer? IOW, shouldn't the following
work (with no other changes):

    ent->util = (void *)((uintptr_t)what | ent->util);

?

(I dropped the explicit cast on the right-hand side, since ent->util is
already a uintptr_t, and the left-hand side has an explicit cast, so
there isn't any type promotion going on here).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke
  2022-01-05 16:39 ` Konstantin Khomoutov
  2022-01-06 22:50 ` Taylor Blau
@ 2022-01-06 22:53 ` Junio C Hamano
  2022-01-06 23:02   ` Jessica Clarke
  2022-01-07 12:16   ` René Scharfe
  2 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2022-01-06 22:53 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

Jessica Clarke <jrtc27@jrtc27.com> writes:

> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word.
> In order to preserve this metadata, uintptr_t is also implemented as a
> capability, not a plain integer, which causes problems for binary
> operators, as the metadata preserved in the output can only come from
> one of the inputs. In most cases this is clear, as normally at least one
> operand is provably a plain integer, but if both operands are uintptr_t
> and have no indication they're just plain integers then it is ambiguous,
> and the current implementation will arbitrarily, but deterministically,
> pick the left-hand side, due to empirical evidence that it is more
> likely to be correct.

What's left-hand side in the context of the code you changed?
Between "what" vs "ent->util" you take "what"?  That cannot be
true.  Are you referring to the (usually hidden and useless when we
use it as an integer) "hardware capabilities" word as "left" vs the
value of the pointer as "right"?

>  static uintptr_t register_symlink_changes(struct apply_state *state,
>  					  const char *path,
> -					  uintptr_t what)
> +					  size_t what)
>  {
>  	struct string_list_item *ent;
>  
> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state,
>  		ent = string_list_insert(&state->symlink_changes, path);
>  		ent->util = (void *)0;
>  	}
> -	ent->util = (void *)(what | ((uintptr_t)ent->util));
> +	ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));
>  	return (uintptr_t)ent->util;
>  }

I actually wonder if it results in code that is much easier to
follow if we did:

 * Introduce an "enum apply_symlink_treatment" that has
   APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
   possible values;

 * Make register_symlink_changes() and check_symlink_changes()
   work with "enum apply_symlink_treatment";

 * (optional) stop using string_list() to store the symlink_changes;
   use strintmap and use strintmap_set() and strintmap_get() to
   access its entries, so that the ugly implementation detail
   (i.e. "the container type we use only has a (void *) field to
   store caller-supplied data, so we cast an integer and a pointer
   back and forth") can be safely hidden.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-06 22:50 ` Taylor Blau
@ 2022-01-06 22:57   ` Jessica Clarke
  0 siblings, 0 replies; 14+ messages in thread
From: Jessica Clarke @ 2022-01-06 22:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

]On 6 Jan 2022, at 22:50, Taylor Blau <me@ttaylorr.com> wrote:
> 
> On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote:
>> [...] In most cases this is clear, as normally at least one operand is
>> provably a plain integer, but if both operands are uintptr_t and have
>> no indication they're just plain integers then it is ambiguous, and
>> the current implementation will arbitrarily, but deterministically,
>> pick the left-hand side, due to empirical evidence that it is more
>> likely to be correct.
> 
> Wouldn't a simpler, less invasive fix be to instead write the expression
> so that the left-hand operand is a pointer? IOW, shouldn't the following
> work (with no other changes):
> 
>    ent->util = (void *)((uintptr_t)what | ent->util);
> 
> ?
> 
> (I dropped the explicit cast on the right-hand side, since ent->util is
> already a uintptr_t, and the left-hand side has an explicit cast, so
> there isn't any type promotion going on here).

That would still warn as it’s still ambiguous. It just happens that the
arbitrary choice picks the right side. Better to clean up the code to
remove the ambiguity and clarify it to the programmer.

Jess


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-06 22:53 ` Junio C Hamano
@ 2022-01-06 23:02   ` Jessica Clarke
  2022-01-06 23:41     ` Junio C Hamano
  2022-01-07 12:16   ` René Scharfe
  1 sibling, 1 reply; 14+ messages in thread
From: Jessica Clarke @ 2022-01-06 23:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 6 Jan 2022, at 22:53, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Jessica Clarke <jrtc27@jrtc27.com> writes:
> 
>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>> hardware capabilities which, as well as having a normal integer address,
>> have additional bounds, permissions and other metadata in a second word.
>> In order to preserve this metadata, uintptr_t is also implemented as a
>> capability, not a plain integer, which causes problems for binary
>> operators, as the metadata preserved in the output can only come from
>> one of the inputs. In most cases this is clear, as normally at least one
>> operand is provably a plain integer, but if both operands are uintptr_t
>> and have no indication they're just plain integers then it is ambiguous,
>> and the current implementation will arbitrarily, but deterministically,
>> pick the left-hand side, due to empirical evidence that it is more
>> likely to be correct.
> 
> What's left-hand side in the context of the code you changed?
> Between "what" vs "ent->util" you take "what"?  That cannot be
> true.  Are you referring to the (usually hidden and useless when we
> use it as an integer) "hardware capabilities" word as "left" vs the
> value of the pointer as "right"?

Left-hand side is what, right-hand side is ((uintptr_t)ent->util). The
bounds/permissions/tag/etc need to be inherited from somewhere, and as
an arbitrary empirically-best choice when otherwise ambiguous we choose
the left, i.e. what. The alternative would just be to error, which will
result in strictly more code failing to build for CHERI despite such a
guess being correct most of the time.

>> static uintptr_t register_symlink_changes(struct apply_state *state,
>> 					  const char *path,
>> -					  uintptr_t what)
>> +					  size_t what)
>> {
>> 	struct string_list_item *ent;
>> 
>> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state,
>> 		ent = string_list_insert(&state->symlink_changes, path);
>> 		ent->util = (void *)0;
>> 	}
>> -	ent->util = (void *)(what | ((uintptr_t)ent->util));
>> +	ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));
>> 	return (uintptr_t)ent->util;
>> }
> 
> I actually wonder if it results in code that is much easier to
> follow if we did:
> 
> * Introduce an "enum apply_symlink_treatment" that has
>   APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
>   possible values;
> 
> * Make register_symlink_changes() and check_symlink_changes()
>   work with "enum apply_symlink_treatment";
> 
> * (optional) stop using string_list() to store the symlink_changes;
>   use strintmap and use strintmap_set() and strintmap_get() to
>   access its entries, so that the ugly implementation detail
>   (i.e. "the container type we use only has a (void *) field to
>   store caller-supplied data, so we cast an integer and a pointer
>   back and forth") can be safely hidden.

Those would be better if you want a less-minimal change. I can easily
do the first two, but the last one may or may not take me a while to
figure out given I’m not familiar with git’s C internals.

Jess


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-06 23:02   ` Jessica Clarke
@ 2022-01-06 23:41     ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2022-01-06 23:41 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

Jessica Clarke <jrtc27@jrtc27.com> writes:

>> I actually wonder if it results in code that is much easier to
>> follow if we did:
>> 
>> * Introduce an "enum apply_symlink_treatment" that has
>>   APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
>>   possible values;
>> 
>> * Make register_symlink_changes() and check_symlink_changes()
>>   work with "enum apply_symlink_treatment";
>> 
>> * (optional) stop using string_list() to store the symlink_changes;
>>   use strintmap and use strintmap_set() and strintmap_get() to
>>   access its entries, so that the ugly implementation detail
>>   (i.e. "the container type we use only has a (void *) field to
>>   store caller-supplied data, so we cast an integer and a pointer
>>   back and forth") can be safely hidden.
>
> Those would be better if you want a less-minimal change.

The first two at least would make an understandable change, as
opposed to the code as posted, which is totally opaque why we need
to take size_t and cast it to uintptr_t in the first place.

I have no confidence in myself that I would not accept a future
patch that reverts the change made by this patch happily.  I usually
try to be careful to go back to the originating commit by running
"blame" on the preimage and may find your commit, but that's only
"usually".  If I were to work only with the file contents after
applying this patch, because no clue ...

	ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));

... on this line of code hints why we must be called with size_t and
have to cast it here, instead of working with uintptr_t throughout,
I am reasonably sure I'd happily take such a patch and break your
"fix" here.

If we make the code pass around the enum, have a temporary variable
of the same enum type to compute the next value we stuff in
ent->util, and make an assignment of that enum value to ent->util,
it is much less likely that I'd blindly accept a patch to take us
back to deal with uintptr_t directly.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-06 22:53 ` Junio C Hamano
  2022-01-06 23:02   ` Jessica Clarke
@ 2022-01-07 12:16   ` René Scharfe
  2022-01-07 13:00     ` Jessica Clarke
                       ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: René Scharfe @ 2022-01-07 12:16 UTC (permalink / raw)
  To: Junio C Hamano, Jessica Clarke; +Cc: git

Am 06.01.22 um 23:53 schrieb Junio C Hamano:
> Jessica Clarke <jrtc27@jrtc27.com> writes:
>
>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>> hardware capabilities which, as well as having a normal integer address,
>> have additional bounds, permissions and other metadata in a second word.
>> In order to preserve this metadata, uintptr_t is also implemented as a
>> capability, not a plain integer, which causes problems for binary
>> operators, as the metadata preserved in the output can only come from
>> one of the inputs. In most cases this is clear, as normally at least one
>> operand is provably a plain integer, but if both operands are uintptr_t
>> and have no indication they're just plain integers then it is ambiguous,
>> and the current implementation will arbitrarily, but deterministically,
>> pick the left-hand side, due to empirical evidence that it is more
>> likely to be correct.
>
> What's left-hand side in the context of the code you changed?
> Between "what" vs "ent->util" you take "what"?  That cannot be
> true.  Are you referring to the (usually hidden and useless when we
> use it as an integer) "hardware capabilities" word as "left" vs the
> value of the pointer as "right"?
>
>>  static uintptr_t register_symlink_changes(struct apply_state *state,
>>  					  const char *path,
>> -					  uintptr_t what)
>> +					  size_t what)
>>  {
>>  	struct string_list_item *ent;
>>
>> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state,
>>  		ent = string_list_insert(&state->symlink_changes, path);
>>  		ent->util = (void *)0;
>>  	}
>> -	ent->util = (void *)(what | ((uintptr_t)ent->util));
>> +	ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));
>>  	return (uintptr_t)ent->util;
>>  }
>
> I actually wonder if it results in code that is much easier to
> follow if we did:
>
>  * Introduce an "enum apply_symlink_treatment" that has
>    APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
>    possible values;
>
>  * Make register_symlink_changes() and check_symlink_changes()
>    work with "enum apply_symlink_treatment";
>
>  * (optional) stop using string_list() to store the symlink_changes;
>    use strintmap and use strintmap_set() and strintmap_get() to
>    access its entries, so that the ugly implementation detail
>    (i.e. "the container type we use only has a (void *) field to
>    store caller-supplied data, so we cast an integer and a pointer
>    back and forth") can be safely hidden.
>
Or strsets -- we only need two.

--- >8 ---
Subject: [PATCH] apply: use strsets to track symlinks

Symlink changes are tracked in a string_list, with the util pointer
value indicating whether a symlink is kept or removed.  Using fake
pointer values requires awkward casts.  Use one strset for each type of
change instead to simplify and shorten the code.

Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 apply.c | 42 ++++++++----------------------------------
 apply.h | 26 +++++++++++---------------
 2 files changed, 19 insertions(+), 49 deletions(-)

diff --git a/apply.c b/apply.c
index fed195250b..7deb4f79fd 100644
--- a/apply.c
+++ b/apply.c
@@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state,
 	state->linenr = 1;
 	string_list_init_nodup(&state->fn_table);
 	string_list_init_nodup(&state->limit_by_name);
-	string_list_init_nodup(&state->symlink_changes);
+	strset_init(&state->removed_symlinks);
+	strset_init(&state->kept_symlinks);
 	strbuf_init(&state->root, 0);

 	git_apply_config();
@@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state,
 void clear_apply_state(struct apply_state *state)
 {
 	string_list_clear(&state->limit_by_name, 0);
-	string_list_clear(&state->symlink_changes, 0);
+	strset_clear(&state->removed_symlinks);
+	strset_clear(&state->kept_symlinks);
 	strbuf_release(&state->root);

 	/* &state->fn_table is cleared at the end of apply_patch() */
@@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state,
 	return 0;
 }

-static uintptr_t register_symlink_changes(struct apply_state *state,
-					  const char *path,
-					  uintptr_t what)
-{
-	struct string_list_item *ent;
-
-	ent = string_list_lookup(&state->symlink_changes, path);
-	if (!ent) {
-		ent = string_list_insert(&state->symlink_changes, path);
-		ent->util = (void *)0;
-	}
-	ent->util = (void *)(what | ((uintptr_t)ent->util));
-	return (uintptr_t)ent->util;
-}
-
-static uintptr_t check_symlink_changes(struct apply_state *state, const char *path)
-{
-	struct string_list_item *ent;
-
-	ent = string_list_lookup(&state->symlink_changes, path);
-	if (!ent)
-		return 0;
-	return (uintptr_t)ent->util;
-}
-
 static void prepare_symlink_changes(struct apply_state *state, struct patch *patch)
 {
 	for ( ; patch; patch = patch->next) {
 		if ((patch->old_name && S_ISLNK(patch->old_mode)) &&
 		    (patch->is_rename || patch->is_delete))
 			/* the symlink at patch->old_name is removed */
-			register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY);
+			strset_add(&state->removed_symlinks, patch->old_name);

 		if (patch->new_name && S_ISLNK(patch->new_mode))
 			/* the symlink at patch->new_name is created or remains */
-			register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT);
+			strset_add(&state->kept_symlinks, patch->new_name);
 	}
 }

 static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name)
 {
 	do {
-		unsigned int change;
-
 		while (--name->len && name->buf[name->len] != '/')
 			; /* scan backwards */
 		if (!name->len)
 			break;
 		name->buf[name->len] = '\0';
-		change = check_symlink_changes(state, name->buf);
-		if (change & APPLY_SYMLINK_IN_RESULT)
+		if (strset_contains(&state->kept_symlinks, name->buf))
 			return 1;
-		if (change & APPLY_SYMLINK_GOES_AWAY)
+		if (strset_contains(&state->removed_symlinks, name->buf))
 			/*
 			 * This cannot be "return 0", because we may
 			 * see a new one created at a higher level.
diff --git a/apply.h b/apply.h
index 16202da160..4052da50c0 100644
--- a/apply.h
+++ b/apply.h
@@ -4,6 +4,7 @@
 #include "hash.h"
 #include "lockfile.h"
 #include "string-list.h"
+#include "strmap.h"

 struct repository;

@@ -25,20 +26,6 @@ enum apply_verbosity {
 	verbosity_verbose = 1
 };

-/*
- * We need to keep track of how symlinks in the preimage are
- * manipulated by the patches.  A patch to add a/b/c where a/b
- * is a symlink should not be allowed to affect the directory
- * the symlink points at, but if the same patch removes a/b,
- * it is perfectly fine, as the patch removes a/b to make room
- * to create a directory a/b so that a/b/c can be created.
- *
- * See also "struct string_list symlink_changes" in "struct
- * apply_state".
- */
-#define APPLY_SYMLINK_GOES_AWAY 01
-#define APPLY_SYMLINK_IN_RESULT 02
-
 struct apply_state {
 	const char *prefix;

@@ -86,7 +73,16 @@ struct apply_state {

 	/* Various "current state" */
 	int linenr; /* current line number */
-	struct string_list symlink_changes; /* we have to track symlinks */
+	/*
+	 * We need to keep track of how symlinks in the preimage are
+	 * manipulated by the patches.  A patch to add a/b/c where a/b
+	 * is a symlink should not be allowed to affect the directory
+	 * the symlink points at, but if the same patch removes a/b,
+	 * it is perfectly fine, as the patch removes a/b to make room
+	 * to create a directory a/b so that a/b/c can be created.
+	 */
+	struct strset removed_symlinks;
+	struct strset kept_symlinks;

 	/*
 	 * For "diff-stat" like behaviour, we keep track of the biggest change
--
2.34.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-07 12:16   ` René Scharfe
@ 2022-01-07 13:00     ` Jessica Clarke
  2022-01-07 19:40     ` Junio C Hamano
  2022-01-07 23:25     ` Junio C Hamano
  2 siblings, 0 replies; 14+ messages in thread
From: Jessica Clarke @ 2022-01-07 13:00 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On 7 Jan 2022, at 12:16, René Scharfe <l.s.r@web.de> wrote:
> 
> Am 06.01.22 um 23:53 schrieb Junio C Hamano:
>> Jessica Clarke <jrtc27@jrtc27.com> writes:
>> 
>>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>>> hardware capabilities which, as well as having a normal integer address,
>>> have additional bounds, permissions and other metadata in a second word.
>>> In order to preserve this metadata, uintptr_t is also implemented as a
>>> capability, not a plain integer, which causes problems for binary
>>> operators, as the metadata preserved in the output can only come from
>>> one of the inputs. In most cases this is clear, as normally at least one
>>> operand is provably a plain integer, but if both operands are uintptr_t
>>> and have no indication they're just plain integers then it is ambiguous,
>>> and the current implementation will arbitrarily, but deterministically,
>>> pick the left-hand side, due to empirical evidence that it is more
>>> likely to be correct.
>> 
>> What's left-hand side in the context of the code you changed?
>> Between "what" vs "ent->util" you take "what"?  That cannot be
>> true.  Are you referring to the (usually hidden and useless when we
>> use it as an integer) "hardware capabilities" word as "left" vs the
>> value of the pointer as "right"?
>> 
>>> static uintptr_t register_symlink_changes(struct apply_state *state,
>>> 					  const char *path,
>>> -					  uintptr_t what)
>>> +					  size_t what)
>>> {
>>> 	struct string_list_item *ent;
>>> 
>>> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state,
>>> 		ent = string_list_insert(&state->symlink_changes, path);
>>> 		ent->util = (void *)0;
>>> 	}
>>> -	ent->util = (void *)(what | ((uintptr_t)ent->util));
>>> +	ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));
>>> 	return (uintptr_t)ent->util;
>>> }
>> 
>> I actually wonder if it results in code that is much easier to
>> follow if we did:
>> 
>> * Introduce an "enum apply_symlink_treatment" that has
>>   APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
>>   possible values;
>> 
>> * Make register_symlink_changes() and check_symlink_changes()
>>   work with "enum apply_symlink_treatment";
>> 
>> * (optional) stop using string_list() to store the symlink_changes;
>>   use strintmap and use strintmap_set() and strintmap_get() to
>>   access its entries, so that the ugly implementation detail
>>   (i.e. "the container type we use only has a (void *) field to
>>   store caller-supplied data, so we cast an integer and a pointer
>>   back and forth") can be safely hidden.
>> 
> Or strsets -- we only need two.
> 
> --- >8 ---
> Subject: [PATCH] apply: use strsets to track symlinks
> 
> Symlink changes are tracked in a string_list, with the util pointer
> value indicating whether a symlink is kept or removed.  Using fake
> pointer values requires awkward casts.  Use one strset for each type of
> change instead to simplify and shorten the code.
> 
> Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>

Thanks, this patch makes sense to me. Incidentally, seeing the bigger
picture as a result of this patch touching everywhere that used that
list, I can see that in fact the existing code would have worked, just
with the compiler warning that something potentially iffy was going on.
I had assumed ent->util was still sometimes storing an actual pointer,
with the low bits being used as flags, as many things tend to do, but
in fact it was always NULL plus a couple of flag bits, so both sides of
the | always had the same bounds/permissions/tag, that of NULL (i.e.
tag cleared as invalid, full bounds). This still looks like a nice
cleanup though.

Jess

> ---
> apply.c | 42 ++++++++----------------------------------
> apply.h | 26 +++++++++++---------------
> 2 files changed, 19 insertions(+), 49 deletions(-)
> 
> diff --git a/apply.c b/apply.c
> index fed195250b..7deb4f79fd 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state,
> 	state->linenr = 1;
> 	string_list_init_nodup(&state->fn_table);
> 	string_list_init_nodup(&state->limit_by_name);
> -	string_list_init_nodup(&state->symlink_changes);
> +	strset_init(&state->removed_symlinks);
> +	strset_init(&state->kept_symlinks);
> 	strbuf_init(&state->root, 0);
> 
> 	git_apply_config();
> @@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state,
> void clear_apply_state(struct apply_state *state)
> {
> 	string_list_clear(&state->limit_by_name, 0);
> -	string_list_clear(&state->symlink_changes, 0);
> +	strset_clear(&state->removed_symlinks);
> +	strset_clear(&state->kept_symlinks);
> 	strbuf_release(&state->root);
> 
> 	/* &state->fn_table is cleared at the end of apply_patch() */
> @@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state,
> 	return 0;
> }
> 
> -static uintptr_t register_symlink_changes(struct apply_state *state,
> -					  const char *path,
> -					  uintptr_t what)
> -{
> -	struct string_list_item *ent;
> -
> -	ent = string_list_lookup(&state->symlink_changes, path);
> -	if (!ent) {
> -		ent = string_list_insert(&state->symlink_changes, path);
> -		ent->util = (void *)0;
> -	}
> -	ent->util = (void *)(what | ((uintptr_t)ent->util));
> -	return (uintptr_t)ent->util;
> -}
> -
> -static uintptr_t check_symlink_changes(struct apply_state *state, const char *path)
> -{
> -	struct string_list_item *ent;
> -
> -	ent = string_list_lookup(&state->symlink_changes, path);
> -	if (!ent)
> -		return 0;
> -	return (uintptr_t)ent->util;
> -}
> -
> static void prepare_symlink_changes(struct apply_state *state, struct patch *patch)
> {
> 	for ( ; patch; patch = patch->next) {
> 		if ((patch->old_name && S_ISLNK(patch->old_mode)) &&
> 		    (patch->is_rename || patch->is_delete))
> 			/* the symlink at patch->old_name is removed */
> -			register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY);
> +			strset_add(&state->removed_symlinks, patch->old_name);
> 
> 		if (patch->new_name && S_ISLNK(patch->new_mode))
> 			/* the symlink at patch->new_name is created or remains */
> -			register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT);
> +			strset_add(&state->kept_symlinks, patch->new_name);
> 	}
> }
> 
> static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name)
> {
> 	do {
> -		unsigned int change;
> -
> 		while (--name->len && name->buf[name->len] != '/')
> 			; /* scan backwards */
> 		if (!name->len)
> 			break;
> 		name->buf[name->len] = '\0';
> -		change = check_symlink_changes(state, name->buf);
> -		if (change & APPLY_SYMLINK_IN_RESULT)
> +		if (strset_contains(&state->kept_symlinks, name->buf))
> 			return 1;
> -		if (change & APPLY_SYMLINK_GOES_AWAY)
> +		if (strset_contains(&state->removed_symlinks, name->buf))
> 			/*
> 			 * This cannot be "return 0", because we may
> 			 * see a new one created at a higher level.
> diff --git a/apply.h b/apply.h
> index 16202da160..4052da50c0 100644
> --- a/apply.h
> +++ b/apply.h
> @@ -4,6 +4,7 @@
> #include "hash.h"
> #include "lockfile.h"
> #include "string-list.h"
> +#include "strmap.h"
> 
> struct repository;
> 
> @@ -25,20 +26,6 @@ enum apply_verbosity {
> 	verbosity_verbose = 1
> };
> 
> -/*
> - * We need to keep track of how symlinks in the preimage are
> - * manipulated by the patches.  A patch to add a/b/c where a/b
> - * is a symlink should not be allowed to affect the directory
> - * the symlink points at, but if the same patch removes a/b,
> - * it is perfectly fine, as the patch removes a/b to make room
> - * to create a directory a/b so that a/b/c can be created.
> - *
> - * See also "struct string_list symlink_changes" in "struct
> - * apply_state".
> - */
> -#define APPLY_SYMLINK_GOES_AWAY 01
> -#define APPLY_SYMLINK_IN_RESULT 02
> -
> struct apply_state {
> 	const char *prefix;
> 
> @@ -86,7 +73,16 @@ struct apply_state {
> 
> 	/* Various "current state" */
> 	int linenr; /* current line number */
> -	struct string_list symlink_changes; /* we have to track symlinks */
> +	/*
> +	 * We need to keep track of how symlinks in the preimage are
> +	 * manipulated by the patches.  A patch to add a/b/c where a/b
> +	 * is a symlink should not be allowed to affect the directory
> +	 * the symlink points at, but if the same patch removes a/b,
> +	 * it is perfectly fine, as the patch removes a/b to make room
> +	 * to create a directory a/b so that a/b/c can be created.
> +	 */
> +	struct strset removed_symlinks;
> +	struct strset kept_symlinks;
> 
> 	/*
> 	 * For "diff-stat" like behaviour, we keep track of the biggest change
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-07 12:16   ` René Scharfe
  2022-01-07 13:00     ` Jessica Clarke
@ 2022-01-07 19:40     ` Junio C Hamano
  2022-01-08  0:04       ` René Scharfe
  2022-01-07 23:25     ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2022-01-07 19:40 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jessica Clarke, git

René Scharfe <l.s.r@web.de> writes:

>> I actually wonder if it results in code that is much easier to
>> follow if we did:
>>
>>  * Introduce an "enum apply_symlink_treatment" that has
>>    APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
>>    possible values;
>>
>>  * Make register_symlink_changes() and check_symlink_changes()
>>    work with "enum apply_symlink_treatment";
>>
>>  * (optional) stop using string_list() to store the symlink_changes;
>>    use strintmap and use strintmap_set() and strintmap_get() to
>>    access its entries, so that the ugly implementation detail
>>    (i.e. "the container type we use only has a (void *) field to
>>    store caller-supplied data, so we cast an integer and a pointer
>>    back and forth") can be safely hidden.
>>
> Or strsets -- we only need two.

True.

When we check a path, we make a single look-up of two bit in a
single hashtable but now we need two look-ups, but addition, removal
and renaming of a symlink would be rare enough to matter either way.

Will queue; thanks.

> --- >8 ---
> Subject: [PATCH] apply: use strsets to track symlinks
>
> Symlink changes are tracked in a string_list, with the util pointer
> value indicating whether a symlink is kept or removed.  Using fake
> pointer values requires awkward casts.  Use one strset for each type of
> change instead to simplify and shorten the code.
>
> Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  apply.c | 42 ++++++++----------------------------------
>  apply.h | 26 +++++++++++---------------
>  2 files changed, 19 insertions(+), 49 deletions(-)
>
> diff --git a/apply.c b/apply.c
> index fed195250b..7deb4f79fd 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state,
>  	state->linenr = 1;
>  	string_list_init_nodup(&state->fn_table);
>  	string_list_init_nodup(&state->limit_by_name);
> -	string_list_init_nodup(&state->symlink_changes);
> +	strset_init(&state->removed_symlinks);
> +	strset_init(&state->kept_symlinks);
>  	strbuf_init(&state->root, 0);
>
>  	git_apply_config();
> @@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state,
>  void clear_apply_state(struct apply_state *state)
>  {
>  	string_list_clear(&state->limit_by_name, 0);
> -	string_list_clear(&state->symlink_changes, 0);
> +	strset_clear(&state->removed_symlinks);
> +	strset_clear(&state->kept_symlinks);
>  	strbuf_release(&state->root);
>
>  	/* &state->fn_table is cleared at the end of apply_patch() */
> @@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state,
>  	return 0;
>  }
>
> -static uintptr_t register_symlink_changes(struct apply_state *state,
> -					  const char *path,
> -					  uintptr_t what)
> -{
> -	struct string_list_item *ent;
> -
> -	ent = string_list_lookup(&state->symlink_changes, path);
> -	if (!ent) {
> -		ent = string_list_insert(&state->symlink_changes, path);
> -		ent->util = (void *)0;
> -	}
> -	ent->util = (void *)(what | ((uintptr_t)ent->util));
> -	return (uintptr_t)ent->util;
> -}
> -
> -static uintptr_t check_symlink_changes(struct apply_state *state, const char *path)
> -{
> -	struct string_list_item *ent;
> -
> -	ent = string_list_lookup(&state->symlink_changes, path);
> -	if (!ent)
> -		return 0;
> -	return (uintptr_t)ent->util;
> -}
> -
>  static void prepare_symlink_changes(struct apply_state *state, struct patch *patch)
>  {
>  	for ( ; patch; patch = patch->next) {
>  		if ((patch->old_name && S_ISLNK(patch->old_mode)) &&
>  		    (patch->is_rename || patch->is_delete))
>  			/* the symlink at patch->old_name is removed */
> -			register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY);
> +			strset_add(&state->removed_symlinks, patch->old_name);
>
>  		if (patch->new_name && S_ISLNK(patch->new_mode))
>  			/* the symlink at patch->new_name is created or remains */
> -			register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT);
> +			strset_add(&state->kept_symlinks, patch->new_name);
>  	}
>  }
>
>  static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name)
>  {
>  	do {
> -		unsigned int change;
> -
>  		while (--name->len && name->buf[name->len] != '/')
>  			; /* scan backwards */
>  		if (!name->len)
>  			break;
>  		name->buf[name->len] = '\0';
> -		change = check_symlink_changes(state, name->buf);
> -		if (change & APPLY_SYMLINK_IN_RESULT)
> +		if (strset_contains(&state->kept_symlinks, name->buf))
>  			return 1;
> -		if (change & APPLY_SYMLINK_GOES_AWAY)
> +		if (strset_contains(&state->removed_symlinks, name->buf))
>  			/*
>  			 * This cannot be "return 0", because we may
>  			 * see a new one created at a higher level.
> diff --git a/apply.h b/apply.h
> index 16202da160..4052da50c0 100644
> --- a/apply.h
> +++ b/apply.h
> @@ -4,6 +4,7 @@
>  #include "hash.h"
>  #include "lockfile.h"
>  #include "string-list.h"
> +#include "strmap.h"
>
>  struct repository;
>
> @@ -25,20 +26,6 @@ enum apply_verbosity {
>  	verbosity_verbose = 1
>  };
>
> -/*
> - * We need to keep track of how symlinks in the preimage are
> - * manipulated by the patches.  A patch to add a/b/c where a/b
> - * is a symlink should not be allowed to affect the directory
> - * the symlink points at, but if the same patch removes a/b,
> - * it is perfectly fine, as the patch removes a/b to make room
> - * to create a directory a/b so that a/b/c can be created.
> - *
> - * See also "struct string_list symlink_changes" in "struct
> - * apply_state".
> - */
> -#define APPLY_SYMLINK_GOES_AWAY 01
> -#define APPLY_SYMLINK_IN_RESULT 02
> -
>  struct apply_state {
>  	const char *prefix;
>
> @@ -86,7 +73,16 @@ struct apply_state {
>
>  	/* Various "current state" */
>  	int linenr; /* current line number */
> -	struct string_list symlink_changes; /* we have to track symlinks */
> +	/*
> +	 * We need to keep track of how symlinks in the preimage are
> +	 * manipulated by the patches.  A patch to add a/b/c where a/b
> +	 * is a symlink should not be allowed to affect the directory
> +	 * the symlink points at, but if the same patch removes a/b,
> +	 * it is perfectly fine, as the patch removes a/b to make room
> +	 * to create a directory a/b so that a/b/c can be created.
> +	 */
> +	struct strset removed_symlinks;
> +	struct strset kept_symlinks;
>
>  	/*
>  	 * For "diff-stat" like behaviour, we keep track of the biggest change
> --
> 2.34.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-07 12:16   ` René Scharfe
  2022-01-07 13:00     ` Jessica Clarke
  2022-01-07 19:40     ` Junio C Hamano
@ 2022-01-07 23:25     ` Junio C Hamano
  2 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2022-01-07 23:25 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jessica Clarke, git

René Scharfe <l.s.r@web.de> writes:

> Subject: [PATCH] apply: use strsets to track symlinks
>
> Symlink changes are tracked in a string_list, with the util pointer
> value indicating whether a symlink is kept or removed.  Using fake
> pointer values requires awkward casts.  Use one strset for each type of
> change instead to simplify and shorten the code.
>
> Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com>

This may be recording a misleading credit.  The original patch
certainly was a good input to make us realize that we have a
problem, but has no contribution to the final code.

> Signed-off-by: René Scharfe <l.s.r@web.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-07 19:40     ` Junio C Hamano
@ 2022-01-08  0:04       ` René Scharfe
  2022-01-08  0:51         ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: René Scharfe @ 2022-01-08  0:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jessica Clarke, git

Am 07.01.22 um 20:40 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> I actually wonder if it results in code that is much easier to
>>> follow if we did:
>>>
>>>  * Introduce an "enum apply_symlink_treatment" that has
>>>    APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its
>>>    possible values;
>>>
>>>  * Make register_symlink_changes() and check_symlink_changes()
>>>    work with "enum apply_symlink_treatment";
>>>
>>>  * (optional) stop using string_list() to store the symlink_changes;
>>>    use strintmap and use strintmap_set() and strintmap_get() to
>>>    access its entries, so that the ugly implementation detail
>>>    (i.e. "the container type we use only has a (void *) field to
>>>    store caller-supplied data, so we cast an integer and a pointer
>>>    back and forth") can be safely hidden.
>>>
>> Or strsets -- we only need two.
>
> True.
>
> When we check a path, we make a single look-up of two bit in a
> single hashtable but now we need two look-ups, but addition, removal
> and renaming of a symlink would be rare enough to matter either way.

Hmm, symlinks changes are rare, but this only affects the register phase
(which should be noticeably slow for the string_list based code with its
O(n*log(n)) lookup per registered symlink if there were a lot of them).
But the final lookups are done for each path _component_, of any file
type.

I suspect that two lookups in (sparsely populated) hash tables are still
fast enough.  (At least I couldn't measure any difference with git apply
and a patch between v2.33.0 and v2.34.0.)

René

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
  2022-01-08  0:04       ` René Scharfe
@ 2022-01-08  0:51         ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2022-01-08  0:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jessica Clarke, git

René Scharfe <l.s.r@web.de> writes:

>>> Or strsets -- we only need two.
>>
>> True.
>>
>> When we check a path, we make a single look-up of two bit in a
>> single hashtable but now we need two look-ups, but addition, removal
>> and renaming of a symlink would be rare enough to matter either way.
>
> Hmm, symlinks changes are rare, but this only affects the register phase
> (which should be noticeably slow for the string_list based code with its
> O(n*log(n)) lookup per registered symlink if there were a lot of them).
> But the final lookups are done for each path _component_, of any file
> type.
>
> I suspect that two lookups in (sparsely populated) hash tables are still
> fast enough.  (At least I couldn't measure any difference with git apply
> and a patch between v2.33.0 and v2.34.0.)

Yup.  I do not think it matters, and the resulting code is certainly
a lot simpler and easier to read ;-)

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-08  0:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke
2022-01-05 16:39 ` Konstantin Khomoutov
2022-01-05 16:40   ` Jessica Clarke
2022-01-06 22:50 ` Taylor Blau
2022-01-06 22:57   ` Jessica Clarke
2022-01-06 22:53 ` Junio C Hamano
2022-01-06 23:02   ` Jessica Clarke
2022-01-06 23:41     ` Junio C Hamano
2022-01-07 12:16   ` René Scharfe
2022-01-07 13:00     ` Jessica Clarke
2022-01-07 19:40     ` Junio C Hamano
2022-01-08  0:04       ` René Scharfe
2022-01-08  0:51         ` Junio C Hamano
2022-01-07 23:25     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).