* [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello @ 2022-01-05 13:23 Jessica Clarke 2022-01-05 16:39 ` Konstantin Khomoutov ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Jessica Clarke @ 2022-01-05 13:23 UTC (permalink / raw) To: git; +Cc: Jessica Clarke On CHERI, and thus Arm's Morello prototype, pointers are implemented as hardware capabilities which, as well as having a normal integer address, have additional bounds, permissions and other metadata in a second word. In order to preserve this metadata, uintptr_t is also implemented as a capability, not a plain integer, which causes problems for binary operators, as the metadata preserved in the output can only come from one of the inputs. In most cases this is clear, as normally at least one operand is provably a plain integer, but if both operands are uintptr_t and have no indication they're just plain integers then it is ambiguous, and the current implementation will arbitrarily, but deterministically, pick the left-hand side, due to empirical evidence that it is more likely to be correct. In this instance, both operands are of type uintptr_t, with one being a function argument and one being cast from a pointer, so both could be valid pointers. Moreover, the left-hand side is not the actual pointer. This means that, currently, the code when run on a CHERI architecture will preserve the metadata from the integer, i.e. an invalid capability that will trap on deference, and not the pointer. This can be addressed by changing the type of the function argument in order to more clearly convey intent, both to the compiler so it knows to generate the right code but also to the developer so it's clear that the argument is not in fact a pointer but just a plain integer (in this case being either APPLY_SYMLINK_GOES_AWAY or APPLY_SYMLINK_IN_RESULT). Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> --- apply.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/apply.c b/apply.c index fed195250b..7c7d56cacb 100644 --- a/apply.c +++ b/apply.c @@ -3814,7 +3814,7 @@ static int check_to_create(struct apply_state *state, static uintptr_t register_symlink_changes(struct apply_state *state, const char *path, - uintptr_t what) + size_t what) { struct string_list_item *ent; @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state, ent = string_list_insert(&state->symlink_changes, path); ent->util = (void *)0; } - ent->util = (void *)(what | ((uintptr_t)ent->util)); + ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); return (uintptr_t)ent->util; } -- 2.33.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke @ 2022-01-05 16:39 ` Konstantin Khomoutov 2022-01-05 16:40 ` Jessica Clarke 2022-01-06 22:50 ` Taylor Blau 2022-01-06 22:53 ` Junio C Hamano 2 siblings, 1 reply; 14+ messages in thread From: Konstantin Khomoutov @ 2022-01-05 16:39 UTC (permalink / raw) To: Jessica Clarke; +Cc: git On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote: [...] > This means that, currently, the code when run on a CHERI architecture > will preserve the metadata from the integer, i.e. an invalid capability > that will trap on deference, and not the pointer. ^ You have probably meant to use "dereference" here. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-05 16:39 ` Konstantin Khomoutov @ 2022-01-05 16:40 ` Jessica Clarke 0 siblings, 0 replies; 14+ messages in thread From: Jessica Clarke @ 2022-01-05 16:40 UTC (permalink / raw) To: Konstantin Khomoutov; +Cc: git On 5 Jan 2022, at 16:39, Konstantin Khomoutov <kostix@bswap.ru> wrote: > > On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote: > > [...] >> This means that, currently, the code when run on a CHERI architecture >> will preserve the metadata from the integer, i.e. an invalid capability >> that will trap on deference, and not the pointer. > > ^ You have probably meant to use "dereference" here. Indeed I did; spellcheck doesn’t help for wrong words... Should I send a v2, or can that be fixed on git-am? Jess ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke 2022-01-05 16:39 ` Konstantin Khomoutov @ 2022-01-06 22:50 ` Taylor Blau 2022-01-06 22:57 ` Jessica Clarke 2022-01-06 22:53 ` Junio C Hamano 2 siblings, 1 reply; 14+ messages in thread From: Taylor Blau @ 2022-01-06 22:50 UTC (permalink / raw) To: Jessica Clarke; +Cc: git On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote: > [...] In most cases this is clear, as normally at least one operand is > provably a plain integer, but if both operands are uintptr_t and have > no indication they're just plain integers then it is ambiguous, and > the current implementation will arbitrarily, but deterministically, > pick the left-hand side, due to empirical evidence that it is more > likely to be correct. Wouldn't a simpler, less invasive fix be to instead write the expression so that the left-hand operand is a pointer? IOW, shouldn't the following work (with no other changes): ent->util = (void *)((uintptr_t)what | ent->util); ? (I dropped the explicit cast on the right-hand side, since ent->util is already a uintptr_t, and the left-hand side has an explicit cast, so there isn't any type promotion going on here). Thanks, Taylor ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-06 22:50 ` Taylor Blau @ 2022-01-06 22:57 ` Jessica Clarke 0 siblings, 0 replies; 14+ messages in thread From: Jessica Clarke @ 2022-01-06 22:57 UTC (permalink / raw) To: Taylor Blau; +Cc: git ]On 6 Jan 2022, at 22:50, Taylor Blau <me@ttaylorr.com> wrote: > > On Wed, Jan 05, 2022 at 01:23:10PM +0000, Jessica Clarke wrote: >> [...] In most cases this is clear, as normally at least one operand is >> provably a plain integer, but if both operands are uintptr_t and have >> no indication they're just plain integers then it is ambiguous, and >> the current implementation will arbitrarily, but deterministically, >> pick the left-hand side, due to empirical evidence that it is more >> likely to be correct. > > Wouldn't a simpler, less invasive fix be to instead write the expression > so that the left-hand operand is a pointer? IOW, shouldn't the following > work (with no other changes): > > ent->util = (void *)((uintptr_t)what | ent->util); > > ? > > (I dropped the explicit cast on the right-hand side, since ent->util is > already a uintptr_t, and the left-hand side has an explicit cast, so > there isn't any type promotion going on here). That would still warn as it’s still ambiguous. It just happens that the arbitrary choice picks the right side. Better to clean up the code to remove the ambiguity and clarify it to the programmer. Jess ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke 2022-01-05 16:39 ` Konstantin Khomoutov 2022-01-06 22:50 ` Taylor Blau @ 2022-01-06 22:53 ` Junio C Hamano 2022-01-06 23:02 ` Jessica Clarke 2022-01-07 12:16 ` René Scharfe 2 siblings, 2 replies; 14+ messages in thread From: Junio C Hamano @ 2022-01-06 22:53 UTC (permalink / raw) To: Jessica Clarke; +Cc: git Jessica Clarke <jrtc27@jrtc27.com> writes: > On CHERI, and thus Arm's Morello prototype, pointers are implemented as > hardware capabilities which, as well as having a normal integer address, > have additional bounds, permissions and other metadata in a second word. > In order to preserve this metadata, uintptr_t is also implemented as a > capability, not a plain integer, which causes problems for binary > operators, as the metadata preserved in the output can only come from > one of the inputs. In most cases this is clear, as normally at least one > operand is provably a plain integer, but if both operands are uintptr_t > and have no indication they're just plain integers then it is ambiguous, > and the current implementation will arbitrarily, but deterministically, > pick the left-hand side, due to empirical evidence that it is more > likely to be correct. What's left-hand side in the context of the code you changed? Between "what" vs "ent->util" you take "what"? That cannot be true. Are you referring to the (usually hidden and useless when we use it as an integer) "hardware capabilities" word as "left" vs the value of the pointer as "right"? > static uintptr_t register_symlink_changes(struct apply_state *state, > const char *path, > - uintptr_t what) > + size_t what) > { > struct string_list_item *ent; > > @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state, > ent = string_list_insert(&state->symlink_changes, path); > ent->util = (void *)0; > } > - ent->util = (void *)(what | ((uintptr_t)ent->util)); > + ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); > return (uintptr_t)ent->util; > } I actually wonder if it results in code that is much easier to follow if we did: * Introduce an "enum apply_symlink_treatment" that has APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its possible values; * Make register_symlink_changes() and check_symlink_changes() work with "enum apply_symlink_treatment"; * (optional) stop using string_list() to store the symlink_changes; use strintmap and use strintmap_set() and strintmap_get() to access its entries, so that the ugly implementation detail (i.e. "the container type we use only has a (void *) field to store caller-supplied data, so we cast an integer and a pointer back and forth") can be safely hidden. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-06 22:53 ` Junio C Hamano @ 2022-01-06 23:02 ` Jessica Clarke 2022-01-06 23:41 ` Junio C Hamano 2022-01-07 12:16 ` René Scharfe 1 sibling, 1 reply; 14+ messages in thread From: Jessica Clarke @ 2022-01-06 23:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 6 Jan 2022, at 22:53, Junio C Hamano <gitster@pobox.com> wrote: > > Jessica Clarke <jrtc27@jrtc27.com> writes: > >> On CHERI, and thus Arm's Morello prototype, pointers are implemented as >> hardware capabilities which, as well as having a normal integer address, >> have additional bounds, permissions and other metadata in a second word. >> In order to preserve this metadata, uintptr_t is also implemented as a >> capability, not a plain integer, which causes problems for binary >> operators, as the metadata preserved in the output can only come from >> one of the inputs. In most cases this is clear, as normally at least one >> operand is provably a plain integer, but if both operands are uintptr_t >> and have no indication they're just plain integers then it is ambiguous, >> and the current implementation will arbitrarily, but deterministically, >> pick the left-hand side, due to empirical evidence that it is more >> likely to be correct. > > What's left-hand side in the context of the code you changed? > Between "what" vs "ent->util" you take "what"? That cannot be > true. Are you referring to the (usually hidden and useless when we > use it as an integer) "hardware capabilities" word as "left" vs the > value of the pointer as "right"? Left-hand side is what, right-hand side is ((uintptr_t)ent->util). The bounds/permissions/tag/etc need to be inherited from somewhere, and as an arbitrary empirically-best choice when otherwise ambiguous we choose the left, i.e. what. The alternative would just be to error, which will result in strictly more code failing to build for CHERI despite such a guess being correct most of the time. >> static uintptr_t register_symlink_changes(struct apply_state *state, >> const char *path, >> - uintptr_t what) >> + size_t what) >> { >> struct string_list_item *ent; >> >> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state, >> ent = string_list_insert(&state->symlink_changes, path); >> ent->util = (void *)0; >> } >> - ent->util = (void *)(what | ((uintptr_t)ent->util)); >> + ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); >> return (uintptr_t)ent->util; >> } > > I actually wonder if it results in code that is much easier to > follow if we did: > > * Introduce an "enum apply_symlink_treatment" that has > APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its > possible values; > > * Make register_symlink_changes() and check_symlink_changes() > work with "enum apply_symlink_treatment"; > > * (optional) stop using string_list() to store the symlink_changes; > use strintmap and use strintmap_set() and strintmap_get() to > access its entries, so that the ugly implementation detail > (i.e. "the container type we use only has a (void *) field to > store caller-supplied data, so we cast an integer and a pointer > back and forth") can be safely hidden. Those would be better if you want a less-minimal change. I can easily do the first two, but the last one may or may not take me a while to figure out given I’m not familiar with git’s C internals. Jess ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-06 23:02 ` Jessica Clarke @ 2022-01-06 23:41 ` Junio C Hamano 0 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2022-01-06 23:41 UTC (permalink / raw) To: Jessica Clarke; +Cc: git Jessica Clarke <jrtc27@jrtc27.com> writes: >> I actually wonder if it results in code that is much easier to >> follow if we did: >> >> * Introduce an "enum apply_symlink_treatment" that has >> APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its >> possible values; >> >> * Make register_symlink_changes() and check_symlink_changes() >> work with "enum apply_symlink_treatment"; >> >> * (optional) stop using string_list() to store the symlink_changes; >> use strintmap and use strintmap_set() and strintmap_get() to >> access its entries, so that the ugly implementation detail >> (i.e. "the container type we use only has a (void *) field to >> store caller-supplied data, so we cast an integer and a pointer >> back and forth") can be safely hidden. > > Those would be better if you want a less-minimal change. The first two at least would make an understandable change, as opposed to the code as posted, which is totally opaque why we need to take size_t and cast it to uintptr_t in the first place. I have no confidence in myself that I would not accept a future patch that reverts the change made by this patch happily. I usually try to be careful to go back to the originating commit by running "blame" on the preimage and may find your commit, but that's only "usually". If I were to work only with the file contents after applying this patch, because no clue ... ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); ... on this line of code hints why we must be called with size_t and have to cast it here, instead of working with uintptr_t throughout, I am reasonably sure I'd happily take such a patch and break your "fix" here. If we make the code pass around the enum, have a temporary variable of the same enum type to compute the next value we stuff in ent->util, and make an assignment of that enum value to ent->util, it is much less likely that I'd blindly accept a patch to take us back to deal with uintptr_t directly. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-06 22:53 ` Junio C Hamano 2022-01-06 23:02 ` Jessica Clarke @ 2022-01-07 12:16 ` René Scharfe 2022-01-07 13:00 ` Jessica Clarke ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: René Scharfe @ 2022-01-07 12:16 UTC (permalink / raw) To: Junio C Hamano, Jessica Clarke; +Cc: git Am 06.01.22 um 23:53 schrieb Junio C Hamano: > Jessica Clarke <jrtc27@jrtc27.com> writes: > >> On CHERI, and thus Arm's Morello prototype, pointers are implemented as >> hardware capabilities which, as well as having a normal integer address, >> have additional bounds, permissions and other metadata in a second word. >> In order to preserve this metadata, uintptr_t is also implemented as a >> capability, not a plain integer, which causes problems for binary >> operators, as the metadata preserved in the output can only come from >> one of the inputs. In most cases this is clear, as normally at least one >> operand is provably a plain integer, but if both operands are uintptr_t >> and have no indication they're just plain integers then it is ambiguous, >> and the current implementation will arbitrarily, but deterministically, >> pick the left-hand side, due to empirical evidence that it is more >> likely to be correct. > > What's left-hand side in the context of the code you changed? > Between "what" vs "ent->util" you take "what"? That cannot be > true. Are you referring to the (usually hidden and useless when we > use it as an integer) "hardware capabilities" word as "left" vs the > value of the pointer as "right"? > >> static uintptr_t register_symlink_changes(struct apply_state *state, >> const char *path, >> - uintptr_t what) >> + size_t what) >> { >> struct string_list_item *ent; >> >> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state, >> ent = string_list_insert(&state->symlink_changes, path); >> ent->util = (void *)0; >> } >> - ent->util = (void *)(what | ((uintptr_t)ent->util)); >> + ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); >> return (uintptr_t)ent->util; >> } > > I actually wonder if it results in code that is much easier to > follow if we did: > > * Introduce an "enum apply_symlink_treatment" that has > APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its > possible values; > > * Make register_symlink_changes() and check_symlink_changes() > work with "enum apply_symlink_treatment"; > > * (optional) stop using string_list() to store the symlink_changes; > use strintmap and use strintmap_set() and strintmap_get() to > access its entries, so that the ugly implementation detail > (i.e. "the container type we use only has a (void *) field to > store caller-supplied data, so we cast an integer and a pointer > back and forth") can be safely hidden. > Or strsets -- we only need two. --- >8 --- Subject: [PATCH] apply: use strsets to track symlinks Symlink changes are tracked in a string_list, with the util pointer value indicating whether a symlink is kept or removed. Using fake pointer values requires awkward casts. Use one strset for each type of change instead to simplify and shorten the code. Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- apply.c | 42 ++++++++---------------------------------- apply.h | 26 +++++++++++--------------- 2 files changed, 19 insertions(+), 49 deletions(-) diff --git a/apply.c b/apply.c index fed195250b..7deb4f79fd 100644 --- a/apply.c +++ b/apply.c @@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state, state->linenr = 1; string_list_init_nodup(&state->fn_table); string_list_init_nodup(&state->limit_by_name); - string_list_init_nodup(&state->symlink_changes); + strset_init(&state->removed_symlinks); + strset_init(&state->kept_symlinks); strbuf_init(&state->root, 0); git_apply_config(); @@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state, void clear_apply_state(struct apply_state *state) { string_list_clear(&state->limit_by_name, 0); - string_list_clear(&state->symlink_changes, 0); + strset_clear(&state->removed_symlinks); + strset_clear(&state->kept_symlinks); strbuf_release(&state->root); /* &state->fn_table is cleared at the end of apply_patch() */ @@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state, return 0; } -static uintptr_t register_symlink_changes(struct apply_state *state, - const char *path, - uintptr_t what) -{ - struct string_list_item *ent; - - ent = string_list_lookup(&state->symlink_changes, path); - if (!ent) { - ent = string_list_insert(&state->symlink_changes, path); - ent->util = (void *)0; - } - ent->util = (void *)(what | ((uintptr_t)ent->util)); - return (uintptr_t)ent->util; -} - -static uintptr_t check_symlink_changes(struct apply_state *state, const char *path) -{ - struct string_list_item *ent; - - ent = string_list_lookup(&state->symlink_changes, path); - if (!ent) - return 0; - return (uintptr_t)ent->util; -} - static void prepare_symlink_changes(struct apply_state *state, struct patch *patch) { for ( ; patch; patch = patch->next) { if ((patch->old_name && S_ISLNK(patch->old_mode)) && (patch->is_rename || patch->is_delete)) /* the symlink at patch->old_name is removed */ - register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY); + strset_add(&state->removed_symlinks, patch->old_name); if (patch->new_name && S_ISLNK(patch->new_mode)) /* the symlink at patch->new_name is created or remains */ - register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT); + strset_add(&state->kept_symlinks, patch->new_name); } } static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name) { do { - unsigned int change; - while (--name->len && name->buf[name->len] != '/') ; /* scan backwards */ if (!name->len) break; name->buf[name->len] = '\0'; - change = check_symlink_changes(state, name->buf); - if (change & APPLY_SYMLINK_IN_RESULT) + if (strset_contains(&state->kept_symlinks, name->buf)) return 1; - if (change & APPLY_SYMLINK_GOES_AWAY) + if (strset_contains(&state->removed_symlinks, name->buf)) /* * This cannot be "return 0", because we may * see a new one created at a higher level. diff --git a/apply.h b/apply.h index 16202da160..4052da50c0 100644 --- a/apply.h +++ b/apply.h @@ -4,6 +4,7 @@ #include "hash.h" #include "lockfile.h" #include "string-list.h" +#include "strmap.h" struct repository; @@ -25,20 +26,6 @@ enum apply_verbosity { verbosity_verbose = 1 }; -/* - * We need to keep track of how symlinks in the preimage are - * manipulated by the patches. A patch to add a/b/c where a/b - * is a symlink should not be allowed to affect the directory - * the symlink points at, but if the same patch removes a/b, - * it is perfectly fine, as the patch removes a/b to make room - * to create a directory a/b so that a/b/c can be created. - * - * See also "struct string_list symlink_changes" in "struct - * apply_state". - */ -#define APPLY_SYMLINK_GOES_AWAY 01 -#define APPLY_SYMLINK_IN_RESULT 02 - struct apply_state { const char *prefix; @@ -86,7 +73,16 @@ struct apply_state { /* Various "current state" */ int linenr; /* current line number */ - struct string_list symlink_changes; /* we have to track symlinks */ + /* + * We need to keep track of how symlinks in the preimage are + * manipulated by the patches. A patch to add a/b/c where a/b + * is a symlink should not be allowed to affect the directory + * the symlink points at, but if the same patch removes a/b, + * it is perfectly fine, as the patch removes a/b to make room + * to create a directory a/b so that a/b/c can be created. + */ + struct strset removed_symlinks; + struct strset kept_symlinks; /* * For "diff-stat" like behaviour, we keep track of the biggest change -- 2.34.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-07 12:16 ` René Scharfe @ 2022-01-07 13:00 ` Jessica Clarke 2022-01-07 19:40 ` Junio C Hamano 2022-01-07 23:25 ` Junio C Hamano 2 siblings, 0 replies; 14+ messages in thread From: Jessica Clarke @ 2022-01-07 13:00 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, git On 7 Jan 2022, at 12:16, René Scharfe <l.s.r@web.de> wrote: > > Am 06.01.22 um 23:53 schrieb Junio C Hamano: >> Jessica Clarke <jrtc27@jrtc27.com> writes: >> >>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as >>> hardware capabilities which, as well as having a normal integer address, >>> have additional bounds, permissions and other metadata in a second word. >>> In order to preserve this metadata, uintptr_t is also implemented as a >>> capability, not a plain integer, which causes problems for binary >>> operators, as the metadata preserved in the output can only come from >>> one of the inputs. In most cases this is clear, as normally at least one >>> operand is provably a plain integer, but if both operands are uintptr_t >>> and have no indication they're just plain integers then it is ambiguous, >>> and the current implementation will arbitrarily, but deterministically, >>> pick the left-hand side, due to empirical evidence that it is more >>> likely to be correct. >> >> What's left-hand side in the context of the code you changed? >> Between "what" vs "ent->util" you take "what"? That cannot be >> true. Are you referring to the (usually hidden and useless when we >> use it as an integer) "hardware capabilities" word as "left" vs the >> value of the pointer as "right"? >> >>> static uintptr_t register_symlink_changes(struct apply_state *state, >>> const char *path, >>> - uintptr_t what) >>> + size_t what) >>> { >>> struct string_list_item *ent; >>> >>> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state, >>> ent = string_list_insert(&state->symlink_changes, path); >>> ent->util = (void *)0; >>> } >>> - ent->util = (void *)(what | ((uintptr_t)ent->util)); >>> + ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); >>> return (uintptr_t)ent->util; >>> } >> >> I actually wonder if it results in code that is much easier to >> follow if we did: >> >> * Introduce an "enum apply_symlink_treatment" that has >> APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its >> possible values; >> >> * Make register_symlink_changes() and check_symlink_changes() >> work with "enum apply_symlink_treatment"; >> >> * (optional) stop using string_list() to store the symlink_changes; >> use strintmap and use strintmap_set() and strintmap_get() to >> access its entries, so that the ugly implementation detail >> (i.e. "the container type we use only has a (void *) field to >> store caller-supplied data, so we cast an integer and a pointer >> back and forth") can be safely hidden. >> > Or strsets -- we only need two. > > --- >8 --- > Subject: [PATCH] apply: use strsets to track symlinks > > Symlink changes are tracked in a string_list, with the util pointer > value indicating whether a symlink is kept or removed. Using fake > pointer values requires awkward casts. Use one strset for each type of > change instead to simplify and shorten the code. > > Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com> > Signed-off-by: René Scharfe <l.s.r@web.de> Thanks, this patch makes sense to me. Incidentally, seeing the bigger picture as a result of this patch touching everywhere that used that list, I can see that in fact the existing code would have worked, just with the compiler warning that something potentially iffy was going on. I had assumed ent->util was still sometimes storing an actual pointer, with the low bits being used as flags, as many things tend to do, but in fact it was always NULL plus a couple of flag bits, so both sides of the | always had the same bounds/permissions/tag, that of NULL (i.e. tag cleared as invalid, full bounds). This still looks like a nice cleanup though. Jess > --- > apply.c | 42 ++++++++---------------------------------- > apply.h | 26 +++++++++++--------------- > 2 files changed, 19 insertions(+), 49 deletions(-) > > diff --git a/apply.c b/apply.c > index fed195250b..7deb4f79fd 100644 > --- a/apply.c > +++ b/apply.c > @@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state, > state->linenr = 1; > string_list_init_nodup(&state->fn_table); > string_list_init_nodup(&state->limit_by_name); > - string_list_init_nodup(&state->symlink_changes); > + strset_init(&state->removed_symlinks); > + strset_init(&state->kept_symlinks); > strbuf_init(&state->root, 0); > > git_apply_config(); > @@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state, > void clear_apply_state(struct apply_state *state) > { > string_list_clear(&state->limit_by_name, 0); > - string_list_clear(&state->symlink_changes, 0); > + strset_clear(&state->removed_symlinks); > + strset_clear(&state->kept_symlinks); > strbuf_release(&state->root); > > /* &state->fn_table is cleared at the end of apply_patch() */ > @@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state, > return 0; > } > > -static uintptr_t register_symlink_changes(struct apply_state *state, > - const char *path, > - uintptr_t what) > -{ > - struct string_list_item *ent; > - > - ent = string_list_lookup(&state->symlink_changes, path); > - if (!ent) { > - ent = string_list_insert(&state->symlink_changes, path); > - ent->util = (void *)0; > - } > - ent->util = (void *)(what | ((uintptr_t)ent->util)); > - return (uintptr_t)ent->util; > -} > - > -static uintptr_t check_symlink_changes(struct apply_state *state, const char *path) > -{ > - struct string_list_item *ent; > - > - ent = string_list_lookup(&state->symlink_changes, path); > - if (!ent) > - return 0; > - return (uintptr_t)ent->util; > -} > - > static void prepare_symlink_changes(struct apply_state *state, struct patch *patch) > { > for ( ; patch; patch = patch->next) { > if ((patch->old_name && S_ISLNK(patch->old_mode)) && > (patch->is_rename || patch->is_delete)) > /* the symlink at patch->old_name is removed */ > - register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY); > + strset_add(&state->removed_symlinks, patch->old_name); > > if (patch->new_name && S_ISLNK(patch->new_mode)) > /* the symlink at patch->new_name is created or remains */ > - register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT); > + strset_add(&state->kept_symlinks, patch->new_name); > } > } > > static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name) > { > do { > - unsigned int change; > - > while (--name->len && name->buf[name->len] != '/') > ; /* scan backwards */ > if (!name->len) > break; > name->buf[name->len] = '\0'; > - change = check_symlink_changes(state, name->buf); > - if (change & APPLY_SYMLINK_IN_RESULT) > + if (strset_contains(&state->kept_symlinks, name->buf)) > return 1; > - if (change & APPLY_SYMLINK_GOES_AWAY) > + if (strset_contains(&state->removed_symlinks, name->buf)) > /* > * This cannot be "return 0", because we may > * see a new one created at a higher level. > diff --git a/apply.h b/apply.h > index 16202da160..4052da50c0 100644 > --- a/apply.h > +++ b/apply.h > @@ -4,6 +4,7 @@ > #include "hash.h" > #include "lockfile.h" > #include "string-list.h" > +#include "strmap.h" > > struct repository; > > @@ -25,20 +26,6 @@ enum apply_verbosity { > verbosity_verbose = 1 > }; > > -/* > - * We need to keep track of how symlinks in the preimage are > - * manipulated by the patches. A patch to add a/b/c where a/b > - * is a symlink should not be allowed to affect the directory > - * the symlink points at, but if the same patch removes a/b, > - * it is perfectly fine, as the patch removes a/b to make room > - * to create a directory a/b so that a/b/c can be created. > - * > - * See also "struct string_list symlink_changes" in "struct > - * apply_state". > - */ > -#define APPLY_SYMLINK_GOES_AWAY 01 > -#define APPLY_SYMLINK_IN_RESULT 02 > - > struct apply_state { > const char *prefix; > > @@ -86,7 +73,16 @@ struct apply_state { > > /* Various "current state" */ > int linenr; /* current line number */ > - struct string_list symlink_changes; /* we have to track symlinks */ > + /* > + * We need to keep track of how symlinks in the preimage are > + * manipulated by the patches. A patch to add a/b/c where a/b > + * is a symlink should not be allowed to affect the directory > + * the symlink points at, but if the same patch removes a/b, > + * it is perfectly fine, as the patch removes a/b to make room > + * to create a directory a/b so that a/b/c can be created. > + */ > + struct strset removed_symlinks; > + struct strset kept_symlinks; > > /* > * For "diff-stat" like behaviour, we keep track of the biggest change > -- > 2.34.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-07 12:16 ` René Scharfe 2022-01-07 13:00 ` Jessica Clarke @ 2022-01-07 19:40 ` Junio C Hamano 2022-01-08 0:04 ` René Scharfe 2022-01-07 23:25 ` Junio C Hamano 2 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2022-01-07 19:40 UTC (permalink / raw) To: René Scharfe; +Cc: Jessica Clarke, git René Scharfe <l.s.r@web.de> writes: >> I actually wonder if it results in code that is much easier to >> follow if we did: >> >> * Introduce an "enum apply_symlink_treatment" that has >> APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its >> possible values; >> >> * Make register_symlink_changes() and check_symlink_changes() >> work with "enum apply_symlink_treatment"; >> >> * (optional) stop using string_list() to store the symlink_changes; >> use strintmap and use strintmap_set() and strintmap_get() to >> access its entries, so that the ugly implementation detail >> (i.e. "the container type we use only has a (void *) field to >> store caller-supplied data, so we cast an integer and a pointer >> back and forth") can be safely hidden. >> > Or strsets -- we only need two. True. When we check a path, we make a single look-up of two bit in a single hashtable but now we need two look-ups, but addition, removal and renaming of a symlink would be rare enough to matter either way. Will queue; thanks. > --- >8 --- > Subject: [PATCH] apply: use strsets to track symlinks > > Symlink changes are tracked in a string_list, with the util pointer > value indicating whether a symlink is kept or removed. Using fake > pointer values requires awkward casts. Use one strset for each type of > change instead to simplify and shorten the code. > > Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com> > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > apply.c | 42 ++++++++---------------------------------- > apply.h | 26 +++++++++++--------------- > 2 files changed, 19 insertions(+), 49 deletions(-) > > diff --git a/apply.c b/apply.c > index fed195250b..7deb4f79fd 100644 > --- a/apply.c > +++ b/apply.c > @@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state, > state->linenr = 1; > string_list_init_nodup(&state->fn_table); > string_list_init_nodup(&state->limit_by_name); > - string_list_init_nodup(&state->symlink_changes); > + strset_init(&state->removed_symlinks); > + strset_init(&state->kept_symlinks); > strbuf_init(&state->root, 0); > > git_apply_config(); > @@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state, > void clear_apply_state(struct apply_state *state) > { > string_list_clear(&state->limit_by_name, 0); > - string_list_clear(&state->symlink_changes, 0); > + strset_clear(&state->removed_symlinks); > + strset_clear(&state->kept_symlinks); > strbuf_release(&state->root); > > /* &state->fn_table is cleared at the end of apply_patch() */ > @@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state, > return 0; > } > > -static uintptr_t register_symlink_changes(struct apply_state *state, > - const char *path, > - uintptr_t what) > -{ > - struct string_list_item *ent; > - > - ent = string_list_lookup(&state->symlink_changes, path); > - if (!ent) { > - ent = string_list_insert(&state->symlink_changes, path); > - ent->util = (void *)0; > - } > - ent->util = (void *)(what | ((uintptr_t)ent->util)); > - return (uintptr_t)ent->util; > -} > - > -static uintptr_t check_symlink_changes(struct apply_state *state, const char *path) > -{ > - struct string_list_item *ent; > - > - ent = string_list_lookup(&state->symlink_changes, path); > - if (!ent) > - return 0; > - return (uintptr_t)ent->util; > -} > - > static void prepare_symlink_changes(struct apply_state *state, struct patch *patch) > { > for ( ; patch; patch = patch->next) { > if ((patch->old_name && S_ISLNK(patch->old_mode)) && > (patch->is_rename || patch->is_delete)) > /* the symlink at patch->old_name is removed */ > - register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY); > + strset_add(&state->removed_symlinks, patch->old_name); > > if (patch->new_name && S_ISLNK(patch->new_mode)) > /* the symlink at patch->new_name is created or remains */ > - register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT); > + strset_add(&state->kept_symlinks, patch->new_name); > } > } > > static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name) > { > do { > - unsigned int change; > - > while (--name->len && name->buf[name->len] != '/') > ; /* scan backwards */ > if (!name->len) > break; > name->buf[name->len] = '\0'; > - change = check_symlink_changes(state, name->buf); > - if (change & APPLY_SYMLINK_IN_RESULT) > + if (strset_contains(&state->kept_symlinks, name->buf)) > return 1; > - if (change & APPLY_SYMLINK_GOES_AWAY) > + if (strset_contains(&state->removed_symlinks, name->buf)) > /* > * This cannot be "return 0", because we may > * see a new one created at a higher level. > diff --git a/apply.h b/apply.h > index 16202da160..4052da50c0 100644 > --- a/apply.h > +++ b/apply.h > @@ -4,6 +4,7 @@ > #include "hash.h" > #include "lockfile.h" > #include "string-list.h" > +#include "strmap.h" > > struct repository; > > @@ -25,20 +26,6 @@ enum apply_verbosity { > verbosity_verbose = 1 > }; > > -/* > - * We need to keep track of how symlinks in the preimage are > - * manipulated by the patches. A patch to add a/b/c where a/b > - * is a symlink should not be allowed to affect the directory > - * the symlink points at, but if the same patch removes a/b, > - * it is perfectly fine, as the patch removes a/b to make room > - * to create a directory a/b so that a/b/c can be created. > - * > - * See also "struct string_list symlink_changes" in "struct > - * apply_state". > - */ > -#define APPLY_SYMLINK_GOES_AWAY 01 > -#define APPLY_SYMLINK_IN_RESULT 02 > - > struct apply_state { > const char *prefix; > > @@ -86,7 +73,16 @@ struct apply_state { > > /* Various "current state" */ > int linenr; /* current line number */ > - struct string_list symlink_changes; /* we have to track symlinks */ > + /* > + * We need to keep track of how symlinks in the preimage are > + * manipulated by the patches. A patch to add a/b/c where a/b > + * is a symlink should not be allowed to affect the directory > + * the symlink points at, but if the same patch removes a/b, > + * it is perfectly fine, as the patch removes a/b to make room > + * to create a directory a/b so that a/b/c can be created. > + */ > + struct strset removed_symlinks; > + struct strset kept_symlinks; > > /* > * For "diff-stat" like behaviour, we keep track of the biggest change > -- > 2.34.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-07 19:40 ` Junio C Hamano @ 2022-01-08 0:04 ` René Scharfe 2022-01-08 0:51 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: René Scharfe @ 2022-01-08 0:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jessica Clarke, git Am 07.01.22 um 20:40 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >>> I actually wonder if it results in code that is much easier to >>> follow if we did: >>> >>> * Introduce an "enum apply_symlink_treatment" that has >>> APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its >>> possible values; >>> >>> * Make register_symlink_changes() and check_symlink_changes() >>> work with "enum apply_symlink_treatment"; >>> >>> * (optional) stop using string_list() to store the symlink_changes; >>> use strintmap and use strintmap_set() and strintmap_get() to >>> access its entries, so that the ugly implementation detail >>> (i.e. "the container type we use only has a (void *) field to >>> store caller-supplied data, so we cast an integer and a pointer >>> back and forth") can be safely hidden. >>> >> Or strsets -- we only need two. > > True. > > When we check a path, we make a single look-up of two bit in a > single hashtable but now we need two look-ups, but addition, removal > and renaming of a symlink would be rare enough to matter either way. Hmm, symlinks changes are rare, but this only affects the register phase (which should be noticeably slow for the string_list based code with its O(n*log(n)) lookup per registered symlink if there were a lot of them). But the final lookups are done for each path _component_, of any file type. I suspect that two lookups in (sparsely populated) hash tables are still fast enough. (At least I couldn't measure any difference with git apply and a patch between v2.33.0 and v2.34.0.) René ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-08 0:04 ` René Scharfe @ 2022-01-08 0:51 ` Junio C Hamano 0 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2022-01-08 0:51 UTC (permalink / raw) To: René Scharfe; +Cc: Jessica Clarke, git René Scharfe <l.s.r@web.de> writes: >>> Or strsets -- we only need two. >> >> True. >> >> When we check a path, we make a single look-up of two bit in a >> single hashtable but now we need two look-ups, but addition, removal >> and renaming of a symlink would be rare enough to matter either way. > > Hmm, symlinks changes are rare, but this only affects the register phase > (which should be noticeably slow for the string_list based code with its > O(n*log(n)) lookup per registered symlink if there were a lot of them). > But the final lookups are done for each path _component_, of any file > type. > > I suspect that two lookups in (sparsely populated) hash tables are still > fast enough. (At least I couldn't measure any difference with git apply > and a patch between v2.33.0 and v2.34.0.) Yup. I do not think it matters, and the resulting code is certainly a lot simpler and easier to read ;-) Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello 2022-01-07 12:16 ` René Scharfe 2022-01-07 13:00 ` Jessica Clarke 2022-01-07 19:40 ` Junio C Hamano @ 2022-01-07 23:25 ` Junio C Hamano 2 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2022-01-07 23:25 UTC (permalink / raw) To: René Scharfe; +Cc: Jessica Clarke, git René Scharfe <l.s.r@web.de> writes: > Subject: [PATCH] apply: use strsets to track symlinks > > Symlink changes are tracked in a string_list, with the util pointer > value indicating whether a symlink is kept or removed. Using fake > pointer values requires awkward casts. Use one strset for each type of > change instead to simplify and shorten the code. > > Original-patch-by: Jessica Clarke <jrtc27@jrtc27.com> This may be recording a misleading credit. The original patch certainly was a good input to make us realize that we have a problem, but has no contribution to the final code. > Signed-off-by: René Scharfe <l.s.r@web.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-01-08 0:52 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-05 13:23 [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello Jessica Clarke 2022-01-05 16:39 ` Konstantin Khomoutov 2022-01-05 16:40 ` Jessica Clarke 2022-01-06 22:50 ` Taylor Blau 2022-01-06 22:57 ` Jessica Clarke 2022-01-06 22:53 ` Junio C Hamano 2022-01-06 23:02 ` Jessica Clarke 2022-01-06 23:41 ` Junio C Hamano 2022-01-07 12:16 ` René Scharfe 2022-01-07 13:00 ` Jessica Clarke 2022-01-07 19:40 ` Junio C Hamano 2022-01-08 0:04 ` René Scharfe 2022-01-08 0:51 ` Junio C Hamano 2022-01-07 23:25 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).