* [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` @ 2024-04-19 9:51 Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt ` (14 more replies) 0 siblings, 15 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2808 bytes --] Hi, when starting up, Git will initialize `the_repository` by calling `initialize_the_repository()`. Part of this is also that we set the hash algo of `the_repository` to SHA1, which implicitly sets `the_hash_algo` because it is a macro expanding to `the_repository->hash_algo`. Usually, we eventually set up the correct hash algorithm here once we have properly set up `the_repository` via `setup_git_directory()`. But in some commands we actually don't require a Git repository, and thus we will leave the SHA1 hash algorithm in place. This has led to some subtle bugs when the context really asks for a SHA256 repository, which this patch series corrects for most of the part. Some commands need further work, like for example git-diff(1), where the user might want to have the ability to pick a hash function when run outside of a repository. Ultimately, the last patch then drops the setup of the fallback hash algorithm completely. This will cause `the_hash_algo` to be a `NULL` pointer unless explicitly configured, and thus we now start to crash when it gets accessed without doing so beforehand. This is a rather risky thing to do, but it does catch bugs where we might otherwise accidentally do the wrong thing. And even though I think it is the right thing to do even conceptually, I'd be okay to drop it if folks think that the risk is not worth it. Patrick Patrick Steinhardt (11): path: harden validation of HEAD with non-standard hashes parse-options-cb: only abbreviate hashes when hash algo is known attr: don't recompute default attribute source attr: fix BUG() when parsing attrs outside of repo remote-curl: fix parsing of detached SHA256 heads builtin/rev-parse: allow shortening to more than 40 hex characters builtin/blame: don't access potentially unitialized `the_hash_algo` builtin/bundle: abort "verify" early when there is no repository builtin/diff: explicitly set hash algo when there is no repo builtin/shortlog: don't set up revisions without repo repository: stop setting SHA1 as the default object hash attr.c | 31 ++++++++++++++++++++++--------- builtin/blame.c | 5 ++--- builtin/bundle.c | 5 +++++ builtin/diff.c | 9 +++++++++ builtin/rev-parse.c | 5 ++--- builtin/shortlog.c | 2 +- parse-options-cb.c | 3 ++- path.c | 2 +- remote-curl.c | 19 ++++++++++++++++++- repository.c | 2 -- t/t0003-attributes.sh | 15 +++++++++++++++ t/t0040-parse-options.sh | 17 +++++++++++++++++ t/t1500-rev-parse.sh | 6 ++++++ t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++ 14 files changed, 115 insertions(+), 21 deletions(-) -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 01/11] path: harden validation of HEAD with non-standard hashes 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 19:03 ` brian m. carlson 2024-04-22 16:15 ` Junio C Hamano 2024-04-19 9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt ` (13 subsequent siblings) 14 siblings, 2 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2277 bytes --] The `validate_headref()` function takes a path to a supposed "HEAD" file and checks whether its format is something that we understand. It is used as part of our repository discovery to check whether a specific directory is a Git directory or not. Part of the validation is a check for a detached HEAD that contains a plain object ID. To do this validation we use `get_oid_hex()`, which relies on `the_hash_algo`. At this point in time the hash algo cannot yet be initialized though because we didn't yet read the Git config. Consequently, it will always be the SHA1 hash algorithm. In practice this works alright because `get_oid_hex()` only ends up checking whether the prefix of the buffer is a valid object ID. And because SHA1 is shorter than SHA256, the function will successfully parse SHA256 object IDs, as well. It is somewhat fragile though and not really the intent to only check for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` to check whether the "HEAD" file parses as any known hash. One might be hard pressed to tighten the check even further and fully validate the file contents, not only the prefix. In practice though that wouldn't make a lot of sense as it could be that the repository uses a hash function that produces longer hashes than SHA256, but which the current version of Git doesn't understand yet. We'd still want to detect the repository as proper Git repository in that case, and we will fail eventually with a proper error message that the hash isn't understood when trying to set up the repostiory format. It follows that we could just leave the current code intact, as in practice the code change doesn't have any user visible impact. But it also prepares us for `the_hash_algo` being unset when there is no repositroy. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index 67229edb9c..cc02165530 100644 --- a/path.c +++ b/path.c @@ -693,7 +693,7 @@ int validate_headref(const char *path) /* * Is this a detached HEAD? */ - if (!get_oid_hex(buffer, &oid)) + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) return 0; return -1; -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes 2024-04-19 9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt @ 2024-04-19 19:03 ` brian m. carlson 2024-04-22 4:56 ` Patrick Steinhardt 2024-04-22 16:15 ` Junio C Hamano 1 sibling, 1 reply; 67+ messages in thread From: brian m. carlson @ 2024-04-19 19:03 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git [-- Attachment #1: Type: text/plain, Size: 432 bytes --] On 2024-04-19 at 09:51:10, Patrick Steinhardt wrote: > It follows that we could just leave the current code intact, as in > practice the code change doesn't have any user visible impact. But it > also prepares us for `the_hash_algo` being unset when there is no > repositroy. The patch looks fine and is well explained, but you have a typo ("repositroy"). -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes 2024-04-19 19:03 ` brian m. carlson @ 2024-04-22 4:56 ` Patrick Steinhardt 0 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-22 4:56 UTC (permalink / raw) To: brian m. carlson, git [-- Attachment #1: Type: text/plain, Size: 558 bytes --] On Fri, Apr 19, 2024 at 07:03:33PM +0000, brian m. carlson wrote: > On 2024-04-19 at 09:51:10, Patrick Steinhardt wrote: > > It follows that we could just leave the current code intact, as in > > practice the code change doesn't have any user visible impact. But it > > also prepares us for `the_hash_algo` being unset when there is no > > repositroy. > > The patch looks fine and is well explained, but you have a typo > ("repositroy"). Thanks, fixed! I'll refrain from sending out another version just to address this typo though. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes 2024-04-19 9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-19 19:03 ` brian m. carlson @ 2024-04-22 16:15 ` Junio C Hamano 2024-04-23 4:50 ` Patrick Steinhardt 1 sibling, 1 reply; 67+ messages in thread From: Junio C Hamano @ 2024-04-22 16:15 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git Patrick Steinhardt <ps@pks.im> writes: > The `validate_headref()` function takes a path to a supposed "HEAD" file > and checks whether its format is something that we understand. It is > used as part of our repository discovery to check whether a specific > directory is a Git directory or not. > > Part of the validation is a check for a detached HEAD that contains a > plain object ID. To do this validation we use `get_oid_hex()`, which > relies on `the_hash_algo`. At this point in time the hash algo cannot > yet be initialized though because we didn't yet read the Git config. > Consequently, it will always be the SHA1 hash algorithm. > > In practice this works alright because `get_oid_hex()` only ends up > checking whether the prefix of the buffer is a valid object ID. And > because SHA1 is shorter than SHA256, the function will successfully > parse SHA256 object IDs, as well. > > It is somewhat fragile though and not really the intent to only check > for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` > to check whether the "HEAD" file parses as any known hash. All makes sense, and given the above, I strongly suspect that we would want to make the validate_headref() function a file-scope static in setup.c to make sure it is only called/callable from the repository discovery codepath. Especially that if somebody calls this function again after we find out that the repository uses SHA-256, we would want to let the caller know that the detached HEAD records SHA-1 and we are in an inconsistent state. > It follows that we could just leave the current code intact, as in > practice the code change doesn't have any user visible impact. But it > also prepares us for `the_hash_algo` being unset when there is no > repositroy. Or perhaps we use get_oid_hex_any() != GIT_HASH_UNKNOWN when the_hash_algo has not been determined, and use !get_oid_hex() after we have determined which algorithm we are using? It may be what you did in a later step in the series, so let me read on. Thanks. > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > path.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/path.c b/path.c > index 67229edb9c..cc02165530 100644 > --- a/path.c > +++ b/path.c > @@ -693,7 +693,7 @@ int validate_headref(const char *path) > /* > * Is this a detached HEAD? > */ > - if (!get_oid_hex(buffer, &oid)) > + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) > return 0; > > return -1; ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes 2024-04-22 16:15 ` Junio C Hamano @ 2024-04-23 4:50 ` Patrick Steinhardt 2024-04-23 16:54 ` Junio C Hamano 0 siblings, 1 reply; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 4:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3075 bytes --] On Mon, Apr 22, 2024 at 09:15:41AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > The `validate_headref()` function takes a path to a supposed "HEAD" file > > and checks whether its format is something that we understand. It is > > used as part of our repository discovery to check whether a specific > > directory is a Git directory or not. > > > > Part of the validation is a check for a detached HEAD that contains a > > plain object ID. To do this validation we use `get_oid_hex()`, which > > relies on `the_hash_algo`. At this point in time the hash algo cannot > > yet be initialized though because we didn't yet read the Git config. > > Consequently, it will always be the SHA1 hash algorithm. > > > > In practice this works alright because `get_oid_hex()` only ends up > > checking whether the prefix of the buffer is a valid object ID. And > > because SHA1 is shorter than SHA256, the function will successfully > > parse SHA256 object IDs, as well. > > > > It is somewhat fragile though and not really the intent to only check > > for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` > > to check whether the "HEAD" file parses as any known hash. > > All makes sense, and given the above, I strongly suspect that we > would want to make the validate_headref() function a file-scope > static in setup.c to make sure it is only called/callable from the > repository discovery codepath. Especially that if somebody calls > this function again after we find out that the repository uses > SHA-256, we would want to let the caller know that the detached HEAD > records SHA-1 and we are in an inconsistent state. Fair enough, we can definitely do so. It only has a single callsite anyway. > > It follows that we could just leave the current code intact, as in > > practice the code change doesn't have any user visible impact. But it > > also prepares us for `the_hash_algo` being unset when there is no > > repositroy. > > Or perhaps we use get_oid_hex_any() != GIT_HASH_UNKNOWN when > the_hash_algo has not been determined, and use !get_oid_hex() after > we have determined which algorithm we are using? It may be what you > did in a later step in the series, so let me read on. I didn't, and given that it's only used from `is_git_directory()` it would probably not make much sense, either. I'll rather add another patch on top that moves the function around to clarify that it is only expected to be called from "setup.c". Patrick > Thanks. > > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > > --- > > path.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/path.c b/path.c > > index 67229edb9c..cc02165530 100644 > > --- a/path.c > > +++ b/path.c > > @@ -693,7 +693,7 @@ int validate_headref(const char *path) > > /* > > * Is this a detached HEAD? > > */ > > - if (!get_oid_hex(buffer, &oid)) > > + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) > > return 0; > > > > return -1; [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes 2024-04-23 4:50 ` Patrick Steinhardt @ 2024-04-23 16:54 ` Junio C Hamano 0 siblings, 0 replies; 67+ messages in thread From: Junio C Hamano @ 2024-04-23 16:54 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git Patrick Steinhardt <ps@pks.im> writes: >> All makes sense, and given the above, I strongly suspect that we >> would want to make the validate_headref() function a file-scope >> static in setup.c to make sure it is only called/callable from the >> repository discovery codepath. Especially that if somebody calls >> this function again after we find out that the repository uses >> SHA-256, we would want to let the caller know that the detached HEAD >> records SHA-1 and we are in an inconsistent state. > > Fair enough, we can definitely do so. It only has a single callsite > anyway. I was wondering if I was missing a future plans to call it from other code paths that are triggered after the set-up was done. If that is not the case, we should do so. It matters more for the future callers than the current ones. They _all_ have to be aware of the deliberate looseness of the check here. Thanks. ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-23 0:30 ` Justin Tobler 2024-04-19 9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt ` (12 subsequent siblings) 14 siblings, 1 reply; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2444 bytes --] The `OPT__ABBREV()` option can be used to add an option that abbreviates object IDs. When given an length longer than `the_hash_algo->hexsz`, then it will instead set the length to that maximum length. It may not always be guaranteed that we have `the_hash_algo` initialized properly as the hash algortihm can only be set up after we have set up `the_repository`. In that case, the hash would always be truncated to the hex length of SHA1, which may not be what the user desires. In practice it's not a problem as all commands that use `OPT__ABBREV()` also have `RUN_SETUP` set and thus cannot work without a repository. Consequently, both `the_repository` and `the_hash_algo` would be properly set up. Regardless of that, harden the code to not truncate the length when we didn't set up a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- parse-options-cb.c | 3 ++- t/t0040-parse-options.sh | 17 +++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index bdc7fae497..d99d688d3c 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -7,6 +7,7 @@ #include "environment.h" #include "gettext.h" #include "object-name.h" +#include "setup.h" #include "string-list.h" #include "strvec.h" #include "oid-array.h" @@ -29,7 +30,7 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) opt->long_name); if (v && v < MINIMUM_ABBREV) v = MINIMUM_ABBREV; - else if (v > the_hash_algo->hexsz) + else if (startup_info->have_repository && v > the_hash_algo->hexsz) v = the_hash_algo->hexsz; } *(int *)(opt->value) = v; diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh index 8bb2a8b453..45a773642f 100755 --- a/t/t0040-parse-options.sh +++ b/t/t0040-parse-options.sh @@ -176,6 +176,23 @@ test_expect_success 'long options' ' test_cmp expect output ' +test_expect_success 'abbreviate to something longer than SHA1 length' ' + cat >expect <<-EOF && + boolean: 0 + integer: 0 + magnitude: 0 + timestamp: 0 + string: (not set) + abbrev: 100 + verbose: -1 + quiet: 0 + dry run: no + file: (not set) + EOF + test-tool parse-options --abbrev=100 >output && + test_cmp expect output +' + test_expect_success 'missing required value' ' cat >expect <<-\EOF && error: switch `s'\'' requires a value -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known 2024-04-19 9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt @ 2024-04-23 0:30 ` Justin Tobler 0 siblings, 0 replies; 67+ messages in thread From: Justin Tobler @ 2024-04-23 0:30 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git On 24/04/19 11:51AM, Patrick Steinhardt wrote: > The `OPT__ABBREV()` option can be used to add an option that abbreviates > object IDs. When given an length longer than `the_hash_algo->hexsz`, s/an length/a length/ > then it will instead set the length to that maximum length. > > It may not always be guaranteed that we have `the_hash_algo` initialized > properly as the hash algortihm can only be set up after we have set up s/algortihm/algorithm/ > `the_repository`. In that case, the hash would always be truncated to > the hex length of SHA1, which may not be what the user desires. > > In practice it's not a problem as all commands that use `OPT__ABBREV()` > also have `RUN_SETUP` set and thus cannot work without a repository. > Consequently, both `the_repository` and `the_hash_algo` would be > properly set up. > > Regardless of that, harden the code to not truncate the length when we > didn't set up a repository. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> ... A couple of typos I noticed. -Justin ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 03/11] attr: don't recompute default attribute source 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-23 0:32 ` Justin Tobler 2024-04-19 9:51 ` [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt ` (11 subsequent siblings) 14 siblings, 1 reply; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2889 bytes --] The `default_attr_source()` function lazily computes the attr source supposedly once, only. This is done via a static variable `attr_source` that contains the resolved object ID of the attr source's tree. If the variable is the null object ID then we try to look up the attr source, otherwise we skip over it. This has approach is flawed though: the variable will never be set to anything else but the null object ID in case there is no attr source. Consequently, we re-compute the information on every call. And in the worst case, when we silently ignore bad trees, this will cause us to try and look up the treeish every single time. Improve this by introducing a separate variable `has_attr_source` to track whether we already computed the attr source and, if so, whether we have an attr source or not. This also allows us to convert the `ignore_bad_attr_tree` to not be static anymore as the code will only be executed once anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/attr.c b/attr.c index 679e42258c..9d911aeb31 100644 --- a/attr.c +++ b/attr.c @@ -1206,15 +1206,16 @@ static void collect_some_attrs(struct index_state *istate, } static const char *default_attr_source_tree_object_name; -static int ignore_bad_attr_tree; void set_git_attr_source(const char *tree_object_name) { default_attr_source_tree_object_name = xstrdup(tree_object_name); } -static void compute_default_attr_source(struct object_id *attr_source) +static int compute_default_attr_source(struct object_id *attr_source) { + int ignore_bad_attr_tree = 0; + if (!default_attr_source_tree_object_name) default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT); @@ -1230,22 +1231,28 @@ static void compute_default_attr_source(struct object_id *attr_source) ignore_bad_attr_tree = 1; } - if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) - return; + if (!default_attr_source_tree_object_name) + return 0; if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, - attr_source) && !ignore_bad_attr_tree) - die(_("bad --attr-source or GIT_ATTR_SOURCE")); + attr_source)) { + if (!ignore_bad_attr_tree) + die(_("bad --attr-source or GIT_ATTR_SOURCE")); + return 0; + } + + return 1; } static struct object_id *default_attr_source(void) { static struct object_id attr_source; + static int has_attr_source = -1; - if (is_null_oid(&attr_source)) - compute_default_attr_source(&attr_source); - if (is_null_oid(&attr_source)) + if (has_attr_source < 0) + has_attr_source = compute_default_attr_source(&attr_source); + if (!has_attr_source) return NULL; return &attr_source; } -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 03/11] attr: don't recompute default attribute source 2024-04-19 9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt @ 2024-04-23 0:32 ` Justin Tobler 0 siblings, 0 replies; 67+ messages in thread From: Justin Tobler @ 2024-04-23 0:32 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git On 24/04/19 11:51AM, Patrick Steinhardt wrote: > The `default_attr_source()` function lazily computes the attr source > supposedly once, only. This is done via a static variable `attr_source` > that contains the resolved object ID of the attr source's tree. If the > variable is the null object ID then we try to look up the attr source, > otherwise we skip over it. > > This has approach is flawed though: the variable will never be set to Remove "has" -Justin > anything else but the null object ID in case there is no attr source. > Consequently, we re-compute the information on every call. And in the > worst case, when we silently ignore bad trees, this will cause us to try > and look up the treeish every single time. > > Improve this by introducing a separate variable `has_attr_source` to > track whether we already computed the attr source and, if so, whether we > have an attr source or not. > > This also allows us to convert the `ignore_bad_attr_tree` to not be > static anymore as the code will only be executed once anyway. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> ... ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (2 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt ` (10 subsequent siblings) 14 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2065 bytes --] If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are set, then `compute_default_attr_source()` will try to look up the value as a treeish. It is possible to hit that function while outside of a Git repository though, for example when using `git grep --no-index`. In that case, Git will hit a bug because we try to look up the main ref store outside of a repository. Handle the case gracefully and detect when we try to look up an attr source without a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 6 ++++++ t/t0003-attributes.sh | 15 +++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/attr.c b/attr.c index 9d911aeb31..4bd09bcb4b 100644 --- a/attr.c +++ b/attr.c @@ -1234,6 +1234,12 @@ static int compute_default_attr_source(struct object_id *attr_source) if (!default_attr_source_tree_object_name) return 0; + if (!startup_info->have_repository) { + if (!ignore_bad_attr_tree) + die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo")); + return 0; + } + if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, attr_source)) { diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh index 774b52c298..3efdec54dd 100755 --- a/t/t0003-attributes.sh +++ b/t/t0003-attributes.sh @@ -428,6 +428,21 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre ) ' +test_expect_success 'diff without repository with attr source' ' + mkdir -p "$TRASH_DIRECTORY/outside/nongit" && + ( + cd "$TRASH_DIRECTORY/outside/nongit" && + GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" && + export GIT_CEILING_DIRECTORIES && + touch file && + cat >expect <<-EOF && + fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo + EOF + test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err && + test_cmp expect err + ) +' + test_expect_success 'bare repository: with --source' ' ( cd bare.git && -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (3 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt ` (9 subsequent siblings) 14 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 3671 bytes --] The dumb HTTP transport tries to read the remote HEAD reference by downloading the "HEAD" file and then parsing it via `http_fetch_ref()`. This function will either parse the file as an object ID in case it is exactly `the_hash_algo->hexsz` long, or otherwise it will check whether the reference starts with "ref :" and parse it as a symbolic ref. This is broken when parsing detached HEADs of a remote SHA256 repository because we never update `the_hash_algo` to the discovered remote object hash. Consequently, `the_hash_algo` will always be the fallback SHA1 hash algorithm, which will cause us to fail parsing HEAD altogteher when it contains a SHA256 object ID. Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`. While at it, let's make the expected SHA1 fallback explicit in our code, which also addresses an upcoming issue where we are going to remove the SHA1 fallback for `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- remote-curl.c | 19 ++++++++++++++++++- t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/remote-curl.c b/remote-curl.c index 0b6d7815fd..004b707fdf 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -266,12 +266,23 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push) return list; } +/* + * Try to detect the hash algorithm used by the remote repository when using + * the dumb HTTP transport. As dumb transports cannot tell us the object hash + * directly have to derive it from the advertised ref lengths. + */ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads) { const char *p = memchr(heads->buf, '\t', heads->len); int algo; + + /* + * In case the remote has no refs we have no way to reliably determine + * the object hash used by that repository. In that case we simply fall + * back to SHA1, which may or may not be correct. + */ if (!p) - return the_hash_algo; + return &hash_algos[GIT_HASH_SHA1]; algo = hash_algo_by_length((p - heads->buf) / 2); if (algo == GIT_HASH_UNKNOWN) @@ -295,6 +306,12 @@ static struct ref *parse_info_refs(struct discovery *heads) "is this a git repository?", transport_anonymize_url(url.buf)); + /* + * Set the repository's hash algo to whatever we have just detected. + * This ensures that we can correctly parse the remote references. + */ + repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo)); + data = heads->buf; start = NULL; mid = data; diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 4c3b32785d..5f16cbc58d 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -55,6 +55,21 @@ test_expect_success 'list refs from outside any repository' ' test_cmp expect actual ' + +test_expect_success 'list detached HEAD from outside any repository' ' + git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ + "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \ + update-ref --no-deref HEAD refs/heads/main && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info && + cat >expect <<-EOF && + $(git rev-parse main) HEAD + $(git rev-parse main) refs/heads/main + EOF + nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual && + test_cmp expect actual +' + test_expect_success 'create password-protected repository' ' mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" && cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (4 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt ` (8 subsequent siblings) 14 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1995 bytes --] The `--short=` option for git-rev-parse(1) allows the user to specify to how many characters object IDs should be shortened to. The option is broken though for SHA256 repositories because we set the maximum allowed hash size to `the_hash_algo->hexsz` before we have even set up the repo. Consequently, `the_hash_algo` will always be SHA1 and thus we truncate every hash after at most 40 characters. Fix this by accessing `the_hash_algo` only after we have set up the repo. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/rev-parse.c | 5 ++--- t/t1500-rev-parse.sh | 6 ++++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 624182e507..d7b87c605e 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -687,7 +687,6 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) const char *name = NULL; struct object_context unused; struct strbuf buf = STRBUF_INIT; - const int hexsz = the_hash_algo->hexsz; int seen_end_of_options = 0; enum format_type format = FORMAT_DEFAULT; @@ -863,8 +862,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) abbrev = strtoul(arg, NULL, 10); if (abbrev < MINIMUM_ABBREV) abbrev = MINIMUM_ABBREV; - else if (hexsz <= abbrev) - abbrev = hexsz; + else if ((int)the_hash_algo->hexsz <= abbrev) + abbrev = the_hash_algo->hexsz; continue; } if (!strcmp(arg, "--sq")) { diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index a669e592f1..30c31918fd 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -304,4 +304,10 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' ' test_cmp expect actual ' +test_expect_success '--short= truncates to the actual hash length' ' + git rev-parse HEAD >expect && + git rev-parse --short=100 HEAD >actual && + test_cmp expect actual +' + test_done -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (5 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt ` (7 subsequent siblings) 14 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 2001 bytes --] We access `the_hash_algo` in git-blame(1) before we have executed `parse_options_start()`, which may not be properly set up in case we have no repository. This is fine for most of the part because all the call paths that lead to it (git-blame(1), git-annotate(1) as well as git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository. There is one exception though, namely when passing `-h` to print the help. Here we will access `the_hash_algo` even if there is no repo. This works fine right now because `the_hash_algo` gets sets up to point to the SHA1 algorithm via `initialize_repository()`. But we're about to stop doing this, and thus the code would lead to a `NULL` pointer exception. Prepare the code for this and only access `the_hash_algo` after we are sure that there is a proper repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/blame.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 9aa74680a3..e325825936 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -915,7 +915,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix) struct range_set ranges; unsigned int range_i; long anchor; - const int hexsz = the_hash_algo->hexsz; long num_lines = 0; const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage; const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage; @@ -973,11 +972,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix) } else if (show_progress < 0) show_progress = isatty(2); - if (0 < abbrev && abbrev < hexsz) + if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz) /* one more abbrev length is needed for the boundary commit */ abbrev++; else if (!abbrev) - abbrev = hexsz; + abbrev = the_hash_algo->hexsz; if (revs_file && read_ancestry(revs_file)) die_errno("reading graft file '%s' failed", revs_file); -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (6 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt ` (6 subsequent siblings) 14 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1526 bytes --] Verifying a bundle requires us to have a repository. This is encoded in `verify_bundle()`, which will return an error if there is no repository. We call `open_bundle()` before we call `verify_bundle()` though, which already performs some verifications even though we may ultimately abort due to a missing repository. This is problematic because `open_bundle()` already reads the bundle header and verifies that it contains a properly formatted hash. When there is no repository we have no clue what hash function to expect though, so we always end up assuming SHA1 here, which may or may not be correct. Furthermore, we are about to stop initializing `the_hash_algo` when there is no repository, which will lead to segfaults. Check early on whether we have a repository to fix this issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/bundle.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/bundle.c b/builtin/bundle.c index 3ad11dc5d0..d5d41a8f67 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -140,6 +140,11 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { builtin_bundle_verify_usage, options, &bundle_file); /* bundle internals use argv[1] as further parameters */ + if (!startup_info->have_repository) { + ret = error(_("need a repository to verify a bundle")); + goto cleanup; + } + if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) { ret = 1; goto cleanup; -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (7 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-22 18:41 ` Junio C Hamano 2024-04-19 9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt ` (5 subsequent siblings) 14 siblings, 1 reply; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1811 bytes --] The git-diff(1) command can be used outside repositories to diff two files with each other. But even if there is no repository we will end up hashing the files that we are diffing so that we can print the "index" line: ``` diff --git a/a b/b index 7898192..6178079 100644 --- a/a +++ b/b @@ -1 +1 @@ -a +b ``` We implicitly use SHA1 to calculate the hash here, which is because `the_repository` gets initialized with SHA1 during the startup routine. We are about to stop doing this though such that `the_repository` only ever has a hash function when it was properly initialized via a repo's configuration. To give full control to our users, we would ideally add a new switch to git-diff(1) that allows them to specify the hash function when executed outside of a repository. But for now, we only convert the code to make this explicit such that we can stop setting the default hash algorithm for `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/diff.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/builtin/diff.c b/builtin/diff.c index 6e196e0c7d..58ec7e5da2 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix) no_index = DIFF_NO_INDEX_IMPLICIT; } + /* + * When operating outside of a Git repository we need to have a hash + * algorithm at hand so that we can generate the blob hashes. We + * default to SHA1 here, but may eventually want to change this to be + * configurable via a command line option. + */ + if (nongit) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + init_diff_ui_defaults(); git_config(git_diff_ui_config, NULL); prefix = precompose_argv_prefix(argc, argv, prefix); -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo 2024-04-19 9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt @ 2024-04-22 18:41 ` Junio C Hamano 0 siblings, 0 replies; 67+ messages in thread From: Junio C Hamano @ 2024-04-22 18:41 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git Patrick Steinhardt <ps@pks.im> writes: > The git-diff(1) command can be used outside repositories to diff two > files with each other. But even if there is no repository we will end up > hashing the files that we are diffing so that we can print the "index" > line: > > ``` > diff --git a/a b/b > index 7898192..6178079 100644 > --- a/a > +++ b/b > @@ -1 +1 @@ > -a > +b > ``` This will break "git am"; it is customary to indent such sample patch block in a log message. > We implicitly use SHA1 to calculate the hash here, which is because > `the_repository` gets initialized with SHA1 during the startup routine. > We are about to stop doing this though such that `the_repository` only > ever has a hash function when it was properly initialized via a repo's > configuration. > > To give full control to our users, we would ideally add a new switch to > git-diff(1) that allows them to specify the hash function when executed > outside of a repository. But for now, we only convert the code to make > this explicit such that we can stop setting the default hash algorithm > for `the_repository`. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > builtin/diff.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/builtin/diff.c b/builtin/diff.c > index 6e196e0c7d..58ec7e5da2 100644 > --- a/builtin/diff.c > +++ b/builtin/diff.c > @@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix) > no_index = DIFF_NO_INDEX_IMPLICIT; > } > > + /* > + * When operating outside of a Git repository we need to have a hash > + * algorithm at hand so that we can generate the blob hashes. We > + * default to SHA1 here, but may eventually want to change this to be > + * configurable via a command line option. > + */ > + if (nongit) > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + > init_diff_ui_defaults(); > git_config(git_diff_ui_config, NULL); > prefix = precompose_argv_prefix(argc, argv, prefix); ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 10/11] builtin/shortlog: don't set up revisions without repo 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (8 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-23 0:35 ` Justin Tobler 2024-04-19 9:51 ` [PATCH 11/11] repository: stop setting SHA1 as the default object hash Patrick Steinhardt ` (4 subsequent siblings) 14 siblings, 1 reply; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1299 bytes --] It is possible to run git-shortlog(1) outside of a repository by passing it output from git-log(1) via standard input. Obviously, as there is no repository in that context, it is thus unsupported to pass any revisions as arguments. Reghardless of that we still end up calling `setup_revisions()`. While that works alright, it is somewhat strange. Furthermore, this is about to cause problems when we unset the default object hash. Refactor the code to only call `setup_revisions()` when we have a repository. This is safe to do as we already verify that there are no arguments when running outside of a repository anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/shortlog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3c7cd2d6ef..d4daf31e22 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -435,7 +435,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) usage_with_options(shortlog_usage, options); } - if (setup_revisions(argc, argv, &rev, NULL) != 1) { + if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) { error(_("unrecognized argument: %s"), argv[1]); usage_with_options(shortlog_usage, options); } -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 10/11] builtin/shortlog: don't set up revisions without repo 2024-04-19 9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt @ 2024-04-23 0:35 ` Justin Tobler 0 siblings, 0 replies; 67+ messages in thread From: Justin Tobler @ 2024-04-23 0:35 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git On 24/04/19 11:51AM, Patrick Steinhardt wrote: > It is possible to run git-shortlog(1) outside of a repository by passing > it output from git-log(1) via standard input. Obviously, as there is no > repository in that context, it is thus unsupported to pass any revisions > as arguments. > > Reghardless of that we still end up calling `setup_revisions()`. While s/Reghardless/Regardless/ > that works alright, it is somewhat strange. Furthermore, this is about > to cause problems when we unset the default object hash. > > Refactor the code to only call `setup_revisions()` when we have a > repository. This is safe to do as we already verify that there are no > arguments when running outside of a repository anyway. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> ... ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 11/11] repository: stop setting SHA1 as the default object hash 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (9 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt @ 2024-04-19 9:51 ` Patrick Steinhardt 2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson ` (3 subsequent siblings) 14 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-19 9:51 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1694 bytes --] During the startup of Git, we call `initialize_the_repository()` to set up `the_repository` as well as `the_index`. Part of this setup is also to set the default object hash of the repository to SHA1. This has the effect that `the_hash_algo` is getting initialized to SHA1, as well. This default hash algorithm eventually gets overridden by most Git commands via `setup_git_directory()`, which also detects the actual hash algorithm used by the repository. There are some commands though that don't access a repository at all, or at a later point only, and thus retain the default hash function for some amount of time. As some of the the preceding commits demonstrate, this can lead to subtle issues when we access `the_hash_algo` when no repository has been set up. Address this issue by dropping the set up of the default hash algorithm completely. The effect of this is that `the_hash_algo` will map to a `NULL` pointer and thus cause Git to crash when something tries to access the hash algorithm without it being properly initialized. It thus forces all Git commands to explicitly set up the hash algorithm in case there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- repository.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/repository.c b/repository.c index e15b416944..b65b1a8c8b 100644 --- a/repository.c +++ b/repository.c @@ -35,8 +35,6 @@ void initialize_the_repository(void) the_repo.parsed_objects = parsed_object_pool_new(); index_state_init(&the_index, the_repository); - - repo_set_hash_algo(&the_repo, GIT_HASH_SHA1); } static void expand_base_dir(char **out, const char *in, -- 2.44.GIT [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (10 preceding siblings ...) 2024-04-19 9:51 ` [PATCH 11/11] repository: stop setting SHA1 as the default object hash Patrick Steinhardt @ 2024-04-19 19:12 ` brian m. carlson 2024-04-19 19:16 ` Junio C Hamano 2024-04-22 4:56 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (2 subsequent siblings) 14 siblings, 2 replies; 67+ messages in thread From: brian m. carlson @ 2024-04-19 19:12 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git [-- Attachment #1: Type: text/plain, Size: 2039 bytes --] On 2024-04-19 at 09:51:03, Patrick Steinhardt wrote: > Hi, > > when starting up, Git will initialize `the_repository` by calling > `initialize_the_repository()`. Part of this is also that we set the hash > algo of `the_repository` to SHA1, which implicitly sets `the_hash_algo` > because it is a macro expanding to `the_repository->hash_algo`. > > Usually, we eventually set up the correct hash algorithm here once we > have properly set up `the_repository` via `setup_git_directory()`. But > in some commands we actually don't require a Git repository, and thus we > will leave the SHA1 hash algorithm in place. > > This has led to some subtle bugs when the context really asks for a > SHA256 repository, which this patch series corrects for most of the > part. Some commands need further work, like for example git-diff(1), > where the user might want to have the ability to pick a hash function > when run outside of a repository. > > Ultimately, the last patch then drops the setup of the fallback hash > algorithm completely. This will cause `the_hash_algo` to be a `NULL` > pointer unless explicitly configured, and thus we now start to crash > when it gets accessed without doing so beforehand. This is a rather > risky thing to do, but it does catch bugs where we might otherwise > accidentally do the wrong thing. And even though I think it is the right > thing to do even conceptually, I'd be okay to drop it if folks think > that the risk is not worth it. I've taken a look, and other than the minor typo I noted, this seems fine. I'm in favour of getting rid of the SHA-1 default, even though my gut tells me we might find a bug or two along the way where things aren't initialized properly. I still think that'll be okay and it's worth doing, since it'll help us prepare for the case in the future where we want to switch the default and also for libification, where we won't want to make those kinds of assumptions. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson @ 2024-04-19 19:16 ` Junio C Hamano 2024-04-22 4:56 ` Patrick Steinhardt 1 sibling, 0 replies; 67+ messages in thread From: Junio C Hamano @ 2024-04-19 19:16 UTC (permalink / raw) To: brian m. carlson; +Cc: Patrick Steinhardt, git "brian m. carlson" <sandals@crustytoothpaste.net> writes: > I've taken a look, and other than the minor typo I noted, this seems > fine. I'm in favour of getting rid of the SHA-1 default, even though my > gut tells me we might find a bug or two along the way where things > aren't initialized properly. I still think that'll be okay and it's > worth doing, since it'll help us prepare for the case in the future > where we want to switch the default and also for libification, where we > won't want to make those kinds of assumptions. Yup, thanks for reviewing (and thanks Patrick for writing the series). ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson 2024-04-19 19:16 ` Junio C Hamano @ 2024-04-22 4:56 ` Patrick Steinhardt 1 sibling, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-22 4:56 UTC (permalink / raw) To: brian m. carlson, git [-- Attachment #1: Type: text/plain, Size: 2517 bytes --] On Fri, Apr 19, 2024 at 07:12:59PM +0000, brian m. carlson wrote: > On 2024-04-19 at 09:51:03, Patrick Steinhardt wrote: > > Hi, > > > > when starting up, Git will initialize `the_repository` by calling > > `initialize_the_repository()`. Part of this is also that we set the hash > > algo of `the_repository` to SHA1, which implicitly sets `the_hash_algo` > > because it is a macro expanding to `the_repository->hash_algo`. > > > > Usually, we eventually set up the correct hash algorithm here once we > > have properly set up `the_repository` via `setup_git_directory()`. But > > in some commands we actually don't require a Git repository, and thus we > > will leave the SHA1 hash algorithm in place. > > > > This has led to some subtle bugs when the context really asks for a > > SHA256 repository, which this patch series corrects for most of the > > part. Some commands need further work, like for example git-diff(1), > > where the user might want to have the ability to pick a hash function > > when run outside of a repository. > > > > Ultimately, the last patch then drops the setup of the fallback hash > > algorithm completely. This will cause `the_hash_algo` to be a `NULL` > > pointer unless explicitly configured, and thus we now start to crash > > when it gets accessed without doing so beforehand. This is a rather > > risky thing to do, but it does catch bugs where we might otherwise > > accidentally do the wrong thing. And even though I think it is the right > > thing to do even conceptually, I'd be okay to drop it if folks think > > that the risk is not worth it. > > I've taken a look, and other than the minor typo I noted, this seems > fine. I'm in favour of getting rid of the SHA-1 default, even though my > gut tells me we might find a bug or two along the way where things > aren't initialized properly. I still think that'll be okay and it's > worth doing, since it'll help us prepare for the case in the future > where we want to switch the default and also for libification, where we > won't want to make those kinds of assumptions. Yeah, I would expect there to be a few more bugs that I just didn't spot. But I think if we do this early in the next release cycle it would give us plenty of time to detect any such issues. And in the worst case we would still be able to revert the last patch of this series. So all in all it hopefully shouldn't be all that bad and prepares us for the future. Thanks for your review! Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (11 preceding siblings ...) 2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt ` (12 more replies) 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 14 siblings, 13 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 6510 bytes --] Hi, this is the second version of my patch series that causes us to stop relying on the SHA1 default hash. Changes compared to v1: - Various typo fixes in commit messages. - Added another patch that moves `validate_headref()` into "setup.c" to clarify that it is only used during repository discovery. - Indented a diff in a commit message so that git-am(1) is happy. Thanks! Patrick Patrick Steinhardt (12): path: harden validation of HEAD with non-standard hashes path: move `validate_headref()` to its only user parse-options-cb: only abbreviate hashes when hash algo is known attr: don't recompute default attribute source attr: fix BUG() when parsing attrs outside of repo remote-curl: fix parsing of detached SHA256 heads builtin/rev-parse: allow shortening to more than 40 hex characters builtin/blame: don't access potentially unitialized `the_hash_algo` builtin/bundle: abort "verify" early when there is no repository builtin/diff: explicitly set hash algo when there is no repo builtin/shortlog: don't set up revisions without repo repository: stop setting SHA1 as the default object hash attr.c | 31 +++++++++++++++------- builtin/blame.c | 5 ++-- builtin/bundle.c | 5 ++++ builtin/diff.c | 9 +++++++ builtin/rev-parse.c | 5 ++-- builtin/shortlog.c | 2 +- parse-options-cb.c | 3 ++- path.c | 53 -------------------------------------- path.h | 1 - remote-curl.c | 19 +++++++++++++- repository.c | 2 -- setup.c | 53 ++++++++++++++++++++++++++++++++++++++ t/t0003-attributes.sh | 15 +++++++++++ t/t0040-parse-options.sh | 17 ++++++++++++ t/t1500-rev-parse.sh | 6 +++++ t/t5550-http-fetch-dumb.sh | 15 +++++++++++ 16 files changed, 167 insertions(+), 74 deletions(-) Range-diff against v1: 1: aa4d6f508b ! 1: a986b464d3 path: harden validation of HEAD with non-standard hashes @@ Commit message current version of Git doesn't understand yet. We'd still want to detect the repository as proper Git repository in that case, and we will fail eventually with a proper error message that the hash isn't understood - when trying to set up the repostiory format. + when trying to set up the repository format. It follows that we could just leave the current code intact, as in practice the code change doesn't have any user visible impact. But it also prepares us for `the_hash_algo` being unset when there is no - repositroy. + repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> -: ---------- > 2: a347c7e6ca path: move `validate_headref()` to its only user 2: 5daaaed2b9 ! 3: c0a15b2fa6 parse-options-cb: only abbreviate hashes when hash algo is known @@ Commit message parse-options-cb: only abbreviate hashes when hash algo is known The `OPT__ABBREV()` option can be used to add an option that abbreviates - object IDs. When given an length longer than `the_hash_algo->hexsz`, - then it will instead set the length to that maximum length. + object IDs. When given a length longer than `the_hash_algo->hexsz`, then + it will instead set the length to that maximum length. It may not always be guaranteed that we have `the_hash_algo` initialized - properly as the hash algortihm can only be set up after we have set up + properly as the hash algorithm can only be set up after we have set up `the_repository`. In that case, the hash would always be truncated to the hex length of SHA1, which may not be what the user desires. 3: ae91a27ffc ! 4: 1b5f904eed attr: don't recompute default attribute source @@ Commit message variable is the null object ID then we try to look up the attr source, otherwise we skip over it. - This has approach is flawed though: the variable will never be set to + This approach is flawed though: the variable will never be set to anything else but the null object ID in case there is no attr source. Consequently, we re-compute the information on every call. And in the worst case, when we silently ignore bad trees, this will cause us to try 4: 53c8e1cd7c = 5: 26909daca4 attr: fix BUG() when parsing attrs outside of repo 5: 32a429fb60 = 6: 0b99184f50 remote-curl: fix parsing of detached SHA256 heads 6: 9cb7baa50c = 7: ccfda3c2d2 builtin/rev-parse: allow shortening to more than 40 hex characters 7: e189a4ad15 = 8: 1813e7eb5c builtin/blame: don't access potentially unitialized `the_hash_algo` 8: bc4bda3508 = 9: 31182a1fc6 builtin/bundle: abort "verify" early when there is no repository 9: 39e56dab62 ! 10: 78e19d0a1b builtin/diff: explicitly set hash algo when there is no repo @@ Commit message hashing the files that we are diffing so that we can print the "index" line: - ``` - diff --git a/a b/b - index 7898192..6178079 100644 - --- a/a - +++ b/b - @@ -1 +1 @@ - -a - +b - ``` + ``` + diff --git a/a b/b + index 7898192..6178079 100644 + --- a/a + +++ b/b + @@ -1 +1 @@ + -a + +b + ``` We implicitly use SHA1 to calculate the hash here, which is because `the_repository` gets initialized with SHA1 during the startup routine. 10: 508e28ed1e ! 11: 51bcddbc31 builtin/shortlog: don't set up revisions without repo @@ Commit message repository in that context, it is thus unsupported to pass any revisions as arguments. - Reghardless of that we still end up calling `setup_revisions()`. While + Regardless of that we still end up calling `setup_revisions()`. While that works alright, it is somewhat strange. Furthermore, this is about to cause problems when we unset the default object hash. 11: f86a6ff3ba = 12: e8126371e1 repository: stop setting SHA1 as the default object hash base-commit: 21306a098c3f174ad4c2a5cddb9069ee27a548b0 -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 02/12] path: move `validate_headref()` to its only user Patrick Steinhardt ` (11 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2279 bytes --] The `validate_headref()` function takes a path to a supposed "HEAD" file and checks whether its format is something that we understand. It is used as part of our repository discovery to check whether a specific directory is a Git directory or not. Part of the validation is a check for a detached HEAD that contains a plain object ID. To do this validation we use `get_oid_hex()`, which relies on `the_hash_algo`. At this point in time the hash algo cannot yet be initialized though because we didn't yet read the Git config. Consequently, it will always be the SHA1 hash algorithm. In practice this works alright because `get_oid_hex()` only ends up checking whether the prefix of the buffer is a valid object ID. And because SHA1 is shorter than SHA256, the function will successfully parse SHA256 object IDs, as well. It is somewhat fragile though and not really the intent to only check for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` to check whether the "HEAD" file parses as any known hash. One might be hard pressed to tighten the check even further and fully validate the file contents, not only the prefix. In practice though that wouldn't make a lot of sense as it could be that the repository uses a hash function that produces longer hashes than SHA256, but which the current version of Git doesn't understand yet. We'd still want to detect the repository as proper Git repository in that case, and we will fail eventually with a proper error message that the hash isn't understood when trying to set up the repository format. It follows that we could just leave the current code intact, as in practice the code change doesn't have any user visible impact. But it also prepares us for `the_hash_algo` being unset when there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index 67229edb9c..cc02165530 100644 --- a/path.c +++ b/path.c @@ -693,7 +693,7 @@ int validate_headref(const char *path) /* * Is this a detached HEAD? */ - if (!get_oid_hex(buffer, &oid)) + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) return 0; return -1; -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 02/12] path: move `validate_headref()` to its only user 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt ` (10 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3965 bytes --] While `validate_headref()` is only called from `is_git_directory()` in "setup.c", it is currently implemented in "path.c". Move it over such that it becomes clear that it is only really used during setup in order to discover repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 53 ----------------------------------------------------- path.h | 1 - setup.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 54 deletions(-) diff --git a/path.c b/path.c index cc02165530..bd6e25245d 100644 --- a/path.c +++ b/path.c @@ -5,7 +5,6 @@ #include "abspath.h" #include "environment.h" #include "gettext.h" -#include "hex.h" #include "repository.h" #include "strbuf.h" #include "string-list.h" @@ -647,58 +646,6 @@ void strbuf_git_common_path(struct strbuf *sb, va_end(args); } -int validate_headref(const char *path) -{ - struct stat st; - char buffer[256]; - const char *refname; - struct object_id oid; - int fd; - ssize_t len; - - if (lstat(path, &st) < 0) - return -1; - - /* Make sure it is a "refs/.." symlink */ - if (S_ISLNK(st.st_mode)) { - len = readlink(path, buffer, sizeof(buffer)-1); - if (len >= 5 && !memcmp("refs/", buffer, 5)) - return 0; - return -1; - } - - /* - * Anything else, just open it and try to see if it is a symbolic ref. - */ - fd = open(path, O_RDONLY); - if (fd < 0) - return -1; - len = read_in_full(fd, buffer, sizeof(buffer)-1); - close(fd); - - if (len < 0) - return -1; - buffer[len] = '\0'; - - /* - * Is it a symbolic ref? - */ - if (skip_prefix(buffer, "ref:", &refname)) { - while (isspace(*refname)) - refname++; - if (starts_with(refname, "refs/")) - return 0; - } - - /* - * Is this a detached HEAD? - */ - if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) - return 0; - - return -1; -} - static struct passwd *getpw_str(const char *username, size_t len) { struct passwd *pw; diff --git a/path.h b/path.h index ea96487b29..c3bc8617bd 100644 --- a/path.h +++ b/path.h @@ -173,7 +173,6 @@ const char *git_path_fetch_head(struct repository *r); const char *git_path_shallow(struct repository *r); int ends_with_path_components(const char *path, const char *components); -int validate_headref(const char *ref); int calc_shared_perm(int mode); int adjust_shared_perm(const char *path); diff --git a/setup.c b/setup.c index f4b32f76e3..7c996659bd 100644 --- a/setup.c +++ b/setup.c @@ -4,6 +4,7 @@ #include "environment.h" #include "exec-cmd.h" #include "gettext.h" +#include "hex.h" #include "object-name.h" #include "refs.h" #include "repository.h" @@ -341,6 +342,58 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir) return ret; } +static int validate_headref(const char *path) +{ + struct stat st; + char buffer[256]; + const char *refname; + struct object_id oid; + int fd; + ssize_t len; + + if (lstat(path, &st) < 0) + return -1; + + /* Make sure it is a "refs/.." symlink */ + if (S_ISLNK(st.st_mode)) { + len = readlink(path, buffer, sizeof(buffer)-1); + if (len >= 5 && !memcmp("refs/", buffer, 5)) + return 0; + return -1; + } + + /* + * Anything else, just open it and try to see if it is a symbolic ref. + */ + fd = open(path, O_RDONLY); + if (fd < 0) + return -1; + len = read_in_full(fd, buffer, sizeof(buffer)-1); + close(fd); + + if (len < 0) + return -1; + buffer[len] = '\0'; + + /* + * Is it a symbolic ref? + */ + if (skip_prefix(buffer, "ref:", &refname)) { + while (isspace(*refname)) + refname++; + if (starts_with(refname, "refs/")) + return 0; + } + + /* + * Is this a detached HEAD? + */ + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) + return 0; + + return -1; +} + /* * Test if it looks like we're at a git directory. * We want to see: -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 02/12] path: move `validate_headref()` to its only user Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 04/12] attr: don't recompute default attribute source Patrick Steinhardt ` (9 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2445 bytes --] The `OPT__ABBREV()` option can be used to add an option that abbreviates object IDs. When given a length longer than `the_hash_algo->hexsz`, then it will instead set the length to that maximum length. It may not always be guaranteed that we have `the_hash_algo` initialized properly as the hash algorithm can only be set up after we have set up `the_repository`. In that case, the hash would always be truncated to the hex length of SHA1, which may not be what the user desires. In practice it's not a problem as all commands that use `OPT__ABBREV()` also have `RUN_SETUP` set and thus cannot work without a repository. Consequently, both `the_repository` and `the_hash_algo` would be properly set up. Regardless of that, harden the code to not truncate the length when we didn't set up a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- parse-options-cb.c | 3 ++- t/t0040-parse-options.sh | 17 +++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index bdc7fae497..d99d688d3c 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -7,6 +7,7 @@ #include "environment.h" #include "gettext.h" #include "object-name.h" +#include "setup.h" #include "string-list.h" #include "strvec.h" #include "oid-array.h" @@ -29,7 +30,7 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) opt->long_name); if (v && v < MINIMUM_ABBREV) v = MINIMUM_ABBREV; - else if (v > the_hash_algo->hexsz) + else if (startup_info->have_repository && v > the_hash_algo->hexsz) v = the_hash_algo->hexsz; } *(int *)(opt->value) = v; diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh index 8bb2a8b453..45a773642f 100755 --- a/t/t0040-parse-options.sh +++ b/t/t0040-parse-options.sh @@ -176,6 +176,23 @@ test_expect_success 'long options' ' test_cmp expect output ' +test_expect_success 'abbreviate to something longer than SHA1 length' ' + cat >expect <<-EOF && + boolean: 0 + integer: 0 + magnitude: 0 + timestamp: 0 + string: (not set) + abbrev: 100 + verbose: -1 + quiet: 0 + dry run: no + file: (not set) + EOF + test-tool parse-options --abbrev=100 >output && + test_cmp expect output +' + test_expect_success 'missing required value' ' cat >expect <<-\EOF && error: switch `s'\'' requires a value -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 04/12] attr: don't recompute default attribute source 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (2 preceding siblings ...) 2024-04-23 5:07 ` [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt ` (8 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2887 bytes --] The `default_attr_source()` function lazily computes the attr source supposedly once, only. This is done via a static variable `attr_source` that contains the resolved object ID of the attr source's tree. If the variable is the null object ID then we try to look up the attr source, otherwise we skip over it. This approach is flawed though: the variable will never be set to anything else but the null object ID in case there is no attr source. Consequently, we re-compute the information on every call. And in the worst case, when we silently ignore bad trees, this will cause us to try and look up the treeish every single time. Improve this by introducing a separate variable `has_attr_source` to track whether we already computed the attr source and, if so, whether we have an attr source or not. This also allows us to convert the `ignore_bad_attr_tree` to not be static anymore as the code will only be executed once anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/attr.c b/attr.c index 679e42258c..9d911aeb31 100644 --- a/attr.c +++ b/attr.c @@ -1206,15 +1206,16 @@ static void collect_some_attrs(struct index_state *istate, } static const char *default_attr_source_tree_object_name; -static int ignore_bad_attr_tree; void set_git_attr_source(const char *tree_object_name) { default_attr_source_tree_object_name = xstrdup(tree_object_name); } -static void compute_default_attr_source(struct object_id *attr_source) +static int compute_default_attr_source(struct object_id *attr_source) { + int ignore_bad_attr_tree = 0; + if (!default_attr_source_tree_object_name) default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT); @@ -1230,22 +1231,28 @@ static void compute_default_attr_source(struct object_id *attr_source) ignore_bad_attr_tree = 1; } - if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) - return; + if (!default_attr_source_tree_object_name) + return 0; if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, - attr_source) && !ignore_bad_attr_tree) - die(_("bad --attr-source or GIT_ATTR_SOURCE")); + attr_source)) { + if (!ignore_bad_attr_tree) + die(_("bad --attr-source or GIT_ATTR_SOURCE")); + return 0; + } + + return 1; } static struct object_id *default_attr_source(void) { static struct object_id attr_source; + static int has_attr_source = -1; - if (is_null_oid(&attr_source)) - compute_default_attr_source(&attr_source); - if (is_null_oid(&attr_source)) + if (has_attr_source < 0) + has_attr_source = compute_default_attr_source(&attr_source); + if (!has_attr_source) return NULL; return &attr_source; } -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (3 preceding siblings ...) 2024-04-23 5:07 ` [PATCH v2 04/12] attr: don't recompute default attribute source Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt ` (7 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2067 bytes --] If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are set, then `compute_default_attr_source()` will try to look up the value as a treeish. It is possible to hit that function while outside of a Git repository though, for example when using `git grep --no-index`. In that case, Git will hit a bug because we try to look up the main ref store outside of a repository. Handle the case gracefully and detect when we try to look up an attr source without a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 6 ++++++ t/t0003-attributes.sh | 15 +++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/attr.c b/attr.c index 9d911aeb31..4bd09bcb4b 100644 --- a/attr.c +++ b/attr.c @@ -1234,6 +1234,12 @@ static int compute_default_attr_source(struct object_id *attr_source) if (!default_attr_source_tree_object_name) return 0; + if (!startup_info->have_repository) { + if (!ignore_bad_attr_tree) + die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo")); + return 0; + } + if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, attr_source)) { diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh index 774b52c298..3efdec54dd 100755 --- a/t/t0003-attributes.sh +++ b/t/t0003-attributes.sh @@ -428,6 +428,21 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre ) ' +test_expect_success 'diff without repository with attr source' ' + mkdir -p "$TRASH_DIRECTORY/outside/nongit" && + ( + cd "$TRASH_DIRECTORY/outside/nongit" && + GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" && + export GIT_CEILING_DIRECTORIES && + touch file && + cat >expect <<-EOF && + fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo + EOF + test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err && + test_cmp expect err + ) +' + test_expect_success 'bare repository: with --source' ' ( cd bare.git && -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (4 preceding siblings ...) 2024-04-23 5:07 ` [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt ` (6 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3673 bytes --] The dumb HTTP transport tries to read the remote HEAD reference by downloading the "HEAD" file and then parsing it via `http_fetch_ref()`. This function will either parse the file as an object ID in case it is exactly `the_hash_algo->hexsz` long, or otherwise it will check whether the reference starts with "ref :" and parse it as a symbolic ref. This is broken when parsing detached HEADs of a remote SHA256 repository because we never update `the_hash_algo` to the discovered remote object hash. Consequently, `the_hash_algo` will always be the fallback SHA1 hash algorithm, which will cause us to fail parsing HEAD altogteher when it contains a SHA256 object ID. Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`. While at it, let's make the expected SHA1 fallback explicit in our code, which also addresses an upcoming issue where we are going to remove the SHA1 fallback for `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- remote-curl.c | 19 ++++++++++++++++++- t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/remote-curl.c b/remote-curl.c index 0b6d7815fd..004b707fdf 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -266,12 +266,23 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push) return list; } +/* + * Try to detect the hash algorithm used by the remote repository when using + * the dumb HTTP transport. As dumb transports cannot tell us the object hash + * directly have to derive it from the advertised ref lengths. + */ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads) { const char *p = memchr(heads->buf, '\t', heads->len); int algo; + + /* + * In case the remote has no refs we have no way to reliably determine + * the object hash used by that repository. In that case we simply fall + * back to SHA1, which may or may not be correct. + */ if (!p) - return the_hash_algo; + return &hash_algos[GIT_HASH_SHA1]; algo = hash_algo_by_length((p - heads->buf) / 2); if (algo == GIT_HASH_UNKNOWN) @@ -295,6 +306,12 @@ static struct ref *parse_info_refs(struct discovery *heads) "is this a git repository?", transport_anonymize_url(url.buf)); + /* + * Set the repository's hash algo to whatever we have just detected. + * This ensures that we can correctly parse the remote references. + */ + repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo)); + data = heads->buf; start = NULL; mid = data; diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 4c3b32785d..5f16cbc58d 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -55,6 +55,21 @@ test_expect_success 'list refs from outside any repository' ' test_cmp expect actual ' + +test_expect_success 'list detached HEAD from outside any repository' ' + git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ + "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \ + update-ref --no-deref HEAD refs/heads/main && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info && + cat >expect <<-EOF && + $(git rev-parse main) HEAD + $(git rev-parse main) refs/heads/main + EOF + nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual && + test_cmp expect actual +' + test_expect_success 'create password-protected repository' ' mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" && cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (5 preceding siblings ...) 2024-04-23 5:07 ` [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt @ 2024-04-23 5:07 ` Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt ` (5 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:07 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1997 bytes --] The `--short=` option for git-rev-parse(1) allows the user to specify to how many characters object IDs should be shortened to. The option is broken though for SHA256 repositories because we set the maximum allowed hash size to `the_hash_algo->hexsz` before we have even set up the repo. Consequently, `the_hash_algo` will always be SHA1 and thus we truncate every hash after at most 40 characters. Fix this by accessing `the_hash_algo` only after we have set up the repo. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/rev-parse.c | 5 ++--- t/t1500-rev-parse.sh | 6 ++++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 624182e507..d7b87c605e 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -687,7 +687,6 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) const char *name = NULL; struct object_context unused; struct strbuf buf = STRBUF_INIT; - const int hexsz = the_hash_algo->hexsz; int seen_end_of_options = 0; enum format_type format = FORMAT_DEFAULT; @@ -863,8 +862,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) abbrev = strtoul(arg, NULL, 10); if (abbrev < MINIMUM_ABBREV) abbrev = MINIMUM_ABBREV; - else if (hexsz <= abbrev) - abbrev = hexsz; + else if ((int)the_hash_algo->hexsz <= abbrev) + abbrev = the_hash_algo->hexsz; continue; } if (!strcmp(arg, "--sq")) { diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index a669e592f1..30c31918fd 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -304,4 +304,10 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' ' test_cmp expect actual ' +test_expect_success '--short= truncates to the actual hash length' ' + git rev-parse HEAD >expect && + git rev-parse --short=100 HEAD >actual && + test_cmp expect actual +' + test_done -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (6 preceding siblings ...) 2024-04-23 5:07 ` [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt @ 2024-04-23 5:08 ` Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt ` (4 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2003 bytes --] We access `the_hash_algo` in git-blame(1) before we have executed `parse_options_start()`, which may not be properly set up in case we have no repository. This is fine for most of the part because all the call paths that lead to it (git-blame(1), git-annotate(1) as well as git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository. There is one exception though, namely when passing `-h` to print the help. Here we will access `the_hash_algo` even if there is no repo. This works fine right now because `the_hash_algo` gets sets up to point to the SHA1 algorithm via `initialize_repository()`. But we're about to stop doing this, and thus the code would lead to a `NULL` pointer exception. Prepare the code for this and only access `the_hash_algo` after we are sure that there is a proper repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/blame.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 9aa74680a3..e325825936 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -915,7 +915,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix) struct range_set ranges; unsigned int range_i; long anchor; - const int hexsz = the_hash_algo->hexsz; long num_lines = 0; const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage; const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage; @@ -973,11 +972,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix) } else if (show_progress < 0) show_progress = isatty(2); - if (0 < abbrev && abbrev < hexsz) + if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz) /* one more abbrev length is needed for the boundary commit */ abbrev++; else if (!abbrev) - abbrev = hexsz; + abbrev = the_hash_algo->hexsz; if (revs_file && read_ancestry(revs_file)) die_errno("reading graft file '%s' failed", revs_file); -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (7 preceding siblings ...) 2024-04-23 5:08 ` [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt @ 2024-04-23 5:08 ` Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt ` (3 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1528 bytes --] Verifying a bundle requires us to have a repository. This is encoded in `verify_bundle()`, which will return an error if there is no repository. We call `open_bundle()` before we call `verify_bundle()` though, which already performs some verifications even though we may ultimately abort due to a missing repository. This is problematic because `open_bundle()` already reads the bundle header and verifies that it contains a properly formatted hash. When there is no repository we have no clue what hash function to expect though, so we always end up assuming SHA1 here, which may or may not be correct. Furthermore, we are about to stop initializing `the_hash_algo` when there is no repository, which will lead to segfaults. Check early on whether we have a repository to fix this issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/bundle.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/bundle.c b/builtin/bundle.c index 3ad11dc5d0..d5d41a8f67 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -140,6 +140,11 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { builtin_bundle_verify_usage, options, &bundle_file); /* bundle internals use argv[1] as further parameters */ + if (!startup_info->have_repository) { + ret = error(_("need a repository to verify a bundle")); + goto cleanup; + } + if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) { ret = 1; goto cleanup; -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (8 preceding siblings ...) 2024-04-23 5:08 ` [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt @ 2024-04-23 5:08 ` Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt ` (2 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1849 bytes --] The git-diff(1) command can be used outside repositories to diff two files with each other. But even if there is no repository we will end up hashing the files that we are diffing so that we can print the "index" line: ``` diff --git a/a b/b index 7898192..6178079 100644 --- a/a +++ b/b @@ -1 +1 @@ -a +b ``` We implicitly use SHA1 to calculate the hash here, which is because `the_repository` gets initialized with SHA1 during the startup routine. We are about to stop doing this though such that `the_repository` only ever has a hash function when it was properly initialized via a repo's configuration. To give full control to our users, we would ideally add a new switch to git-diff(1) that allows them to specify the hash function when executed outside of a repository. But for now, we only convert the code to make this explicit such that we can stop setting the default hash algorithm for `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/diff.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/builtin/diff.c b/builtin/diff.c index 6e196e0c7d..58ec7e5da2 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix) no_index = DIFF_NO_INDEX_IMPLICIT; } + /* + * When operating outside of a Git repository we need to have a hash + * algorithm at hand so that we can generate the blob hashes. We + * default to SHA1 here, but may eventually want to change this to be + * configurable via a command line option. + */ + if (nongit) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + init_diff_ui_defaults(); git_config(git_diff_ui_config, NULL); prefix = precompose_argv_prefix(argc, argv, prefix); -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (9 preceding siblings ...) 2024-04-23 5:08 ` [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt @ 2024-04-23 5:08 ` Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 2024-04-27 22:09 ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1300 bytes --] It is possible to run git-shortlog(1) outside of a repository by passing it output from git-log(1) via standard input. Obviously, as there is no repository in that context, it is thus unsupported to pass any revisions as arguments. Regardless of that we still end up calling `setup_revisions()`. While that works alright, it is somewhat strange. Furthermore, this is about to cause problems when we unset the default object hash. Refactor the code to only call `setup_revisions()` when we have a repository. This is safe to do as we already verify that there are no arguments when running outside of a repository anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/shortlog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3c7cd2d6ef..d4daf31e22 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -435,7 +435,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) usage_with_options(shortlog_usage, options); } - if (setup_revisions(argc, argv, &rev, NULL) != 1) { + if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) { error(_("unrecognized argument: %s"), argv[1]); usage_with_options(shortlog_usage, options); } -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (10 preceding siblings ...) 2024-04-23 5:08 ` [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt @ 2024-04-23 5:08 ` Patrick Steinhardt 2024-04-27 22:09 ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-23 5:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1696 bytes --] During the startup of Git, we call `initialize_the_repository()` to set up `the_repository` as well as `the_index`. Part of this setup is also to set the default object hash of the repository to SHA1. This has the effect that `the_hash_algo` is getting initialized to SHA1, as well. This default hash algorithm eventually gets overridden by most Git commands via `setup_git_directory()`, which also detects the actual hash algorithm used by the repository. There are some commands though that don't access a repository at all, or at a later point only, and thus retain the default hash function for some amount of time. As some of the the preceding commits demonstrate, this can lead to subtle issues when we access `the_hash_algo` when no repository has been set up. Address this issue by dropping the set up of the default hash algorithm completely. The effect of this is that `the_hash_algo` will map to a `NULL` pointer and thus cause Git to crash when something tries to access the hash algorithm without it being properly initialized. It thus forces all Git commands to explicitly set up the hash algorithm in case there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- repository.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/repository.c b/repository.c index e15b416944..b65b1a8c8b 100644 --- a/repository.c +++ b/repository.c @@ -35,8 +35,6 @@ void initialize_the_repository(void) the_repo.parsed_objects = parsed_object_pool_new(); index_state_init(&the_index, the_repository); - - repo_set_hash_algo(&the_repo, GIT_HASH_SHA1); } static void expand_base_dir(char **out, const char *in, -- 2.45.0-rc0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt ` (11 preceding siblings ...) 2024-04-23 5:08 ` [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash Patrick Steinhardt @ 2024-04-27 22:09 ` Junio C Hamano 2024-04-29 6:05 ` Patrick Steinhardt 12 siblings, 1 reply; 67+ messages in thread From: Junio C Hamano @ 2024-04-27 22:09 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, brian m. carlson, Justin Tobler, Josh Steadmon Patrick Steinhardt <ps@pks.im> writes: [cc: added Josh as the commit-graph fuzzer was his creation]. > this is the second version of my patch series that causes us to stop > relying on the SHA1 default hash. With this topic merged, 'seen' fails "fuzz smoke test"; I think this https://github.com/git/git/actions/runs/8807729398/job/24175445340 is the first merge of this topic into 'seen' where "fuzz smoke test" started failing. With the merge of the topic from 'seen' reverted tentatively, https://github.com/git/git/actions/runs/8862811497/job/24336185541 the same test seems happy. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-27 22:09 ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano @ 2024-04-29 6:05 ` Patrick Steinhardt 0 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, brian m. carlson, Justin Tobler, Josh Steadmon [-- Attachment #1: Type: text/plain, Size: 1658 bytes --] On Sat, Apr 27, 2024 at 03:09:43PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > [cc: added Josh as the commit-graph fuzzer was his creation]. > > > this is the second version of my patch series that causes us to stop > > relying on the SHA1 default hash. > > With this topic merged, 'seen' fails "fuzz smoke test"; I think this > > https://github.com/git/git/actions/runs/8807729398/job/24175445340 > > is the first merge of this topic into 'seen' where "fuzz smoke test" > started failing. > > With the merge of the topic from 'seen' reverted tentatively, > > https://github.com/git/git/actions/runs/8862811497/job/24336185541 > > the same test seems happy. Indeed, thanks for the heads up! Another test gap that we have in the GitLab CI setup. I'll add a separate patch series to plug this gap and add the job to GitLab CI, too. The fix for the failure is easy: diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c index 2992079dd9..94ecbb9242 100644 --- a/oss-fuzz/fuzz-commit-graph.c +++ b/oss-fuzz/fuzz-commit-graph.c @@ -18,6 +18,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) * touching the disk to keep the individual fuzz-test cases as fast as * possible. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); the_repository->settings.commit_graph_generation_version = 2; the_repository->settings.commit_graph_read_changed_paths = 1; g = parse_commit_graph(&the_repository->settings, (void *)data, size); I'll send a v3. Thanks! Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v3 00/13] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (12 preceding siblings ...) 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt ` (12 more replies) 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 14 siblings, 13 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3161 bytes --] Hi, this is the third version of my patch series that stops relying on the SHA1 fallback configured for `the_hash_algo`. There's only a single change compared to v2, which is a new patch that fixes a segfault in the commit-graph fuzzer. Thanks! Patrick Patrick Steinhardt (13): path: harden validation of HEAD with non-standard hashes path: move `validate_headref()` to its only user parse-options-cb: only abbreviate hashes when hash algo is known attr: don't recompute default attribute source attr: fix BUG() when parsing attrs outside of repo remote-curl: fix parsing of detached SHA256 heads builtin/rev-parse: allow shortening to more than 40 hex characters builtin/blame: don't access potentially unitialized `the_hash_algo` builtin/bundle: abort "verify" early when there is no repository builtin/diff: explicitly set hash algo when there is no repo builtin/shortlog: don't set up revisions without repo oss-fuzz/commit-graph: set up hash algorithm repository: stop setting SHA1 as the default object hash attr.c | 31 +++++++++++++++------ builtin/blame.c | 5 ++-- builtin/bundle.c | 5 ++++ builtin/diff.c | 9 ++++++ builtin/rev-parse.c | 5 ++-- builtin/shortlog.c | 2 +- oss-fuzz/fuzz-commit-graph.c | 1 + parse-options-cb.c | 3 +- path.c | 53 ------------------------------------ path.h | 1 - remote-curl.c | 19 ++++++++++++- repository.c | 2 -- setup.c | 53 ++++++++++++++++++++++++++++++++++++ t/t0003-attributes.sh | 15 ++++++++++ t/t0040-parse-options.sh | 17 ++++++++++++ t/t1500-rev-parse.sh | 6 ++++ t/t5550-http-fetch-dumb.sh | 15 ++++++++++ 17 files changed, 168 insertions(+), 74 deletions(-) Range-diff against v2: 1: a986b464d3 = 1: 5134f35cda path: harden validation of HEAD with non-standard hashes 2: a347c7e6ca = 2: 589b6a99ef path: move `validate_headref()` to its only user 3: c0a15b2fa6 = 3: 9a63c445d2 parse-options-cb: only abbreviate hashes when hash algo is known 4: 1b5f904eed = 4: 929bacbfce attr: don't recompute default attribute source 5: 26909daca4 = 5: 8f20aec1ee attr: fix BUG() when parsing attrs outside of repo 6: 0b99184f50 = 6: 53439067a1 remote-curl: fix parsing of detached SHA256 heads 7: ccfda3c2d2 = 7: 1f74960760 builtin/rev-parse: allow shortening to more than 40 hex characters 8: 1813e7eb5c = 8: 2d985abca1 builtin/blame: don't access potentially unitialized `the_hash_algo` 9: 31182a1fc6 = 9: f3b23d28aa builtin/bundle: abort "verify" early when there is no repository 10: 78e19d0a1b = 10: 7577b6b96c builtin/diff: explicitly set hash algo when there is no repo 11: 51bcddbc31 = 11: 509c79d1d3 builtin/shortlog: don't set up revisions without repo -: ---------- > 12: 660f976129 oss-fuzz/commit-graph: set up hash algorithm 12: e8126371e1 = 13: 95909c2da5 repository: stop setting SHA1 as the default object hash -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt ` (11 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2279 bytes --] The `validate_headref()` function takes a path to a supposed "HEAD" file and checks whether its format is something that we understand. It is used as part of our repository discovery to check whether a specific directory is a Git directory or not. Part of the validation is a check for a detached HEAD that contains a plain object ID. To do this validation we use `get_oid_hex()`, which relies on `the_hash_algo`. At this point in time the hash algo cannot yet be initialized though because we didn't yet read the Git config. Consequently, it will always be the SHA1 hash algorithm. In practice this works alright because `get_oid_hex()` only ends up checking whether the prefix of the buffer is a valid object ID. And because SHA1 is shorter than SHA256, the function will successfully parse SHA256 object IDs, as well. It is somewhat fragile though and not really the intent to only check for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` to check whether the "HEAD" file parses as any known hash. One might be hard pressed to tighten the check even further and fully validate the file contents, not only the prefix. In practice though that wouldn't make a lot of sense as it could be that the repository uses a hash function that produces longer hashes than SHA256, but which the current version of Git doesn't understand yet. We'd still want to detect the repository as proper Git repository in that case, and we will fail eventually with a proper error message that the hash isn't understood when trying to set up the repository format. It follows that we could just leave the current code intact, as in practice the code change doesn't have any user visible impact. But it also prepares us for `the_hash_algo` being unset when there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index 67229edb9c..cc02165530 100644 --- a/path.c +++ b/path.c @@ -693,7 +693,7 @@ int validate_headref(const char *path) /* * Is this a detached HEAD? */ - if (!get_oid_hex(buffer, &oid)) + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) return 0; return -1; -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 02/13] path: move `validate_headref()` to its only user 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt ` (10 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3965 bytes --] While `validate_headref()` is only called from `is_git_directory()` in "setup.c", it is currently implemented in "path.c". Move it over such that it becomes clear that it is only really used during setup in order to discover repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 53 ----------------------------------------------------- path.h | 1 - setup.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 54 deletions(-) diff --git a/path.c b/path.c index cc02165530..bd6e25245d 100644 --- a/path.c +++ b/path.c @@ -5,7 +5,6 @@ #include "abspath.h" #include "environment.h" #include "gettext.h" -#include "hex.h" #include "repository.h" #include "strbuf.h" #include "string-list.h" @@ -647,58 +646,6 @@ void strbuf_git_common_path(struct strbuf *sb, va_end(args); } -int validate_headref(const char *path) -{ - struct stat st; - char buffer[256]; - const char *refname; - struct object_id oid; - int fd; - ssize_t len; - - if (lstat(path, &st) < 0) - return -1; - - /* Make sure it is a "refs/.." symlink */ - if (S_ISLNK(st.st_mode)) { - len = readlink(path, buffer, sizeof(buffer)-1); - if (len >= 5 && !memcmp("refs/", buffer, 5)) - return 0; - return -1; - } - - /* - * Anything else, just open it and try to see if it is a symbolic ref. - */ - fd = open(path, O_RDONLY); - if (fd < 0) - return -1; - len = read_in_full(fd, buffer, sizeof(buffer)-1); - close(fd); - - if (len < 0) - return -1; - buffer[len] = '\0'; - - /* - * Is it a symbolic ref? - */ - if (skip_prefix(buffer, "ref:", &refname)) { - while (isspace(*refname)) - refname++; - if (starts_with(refname, "refs/")) - return 0; - } - - /* - * Is this a detached HEAD? - */ - if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) - return 0; - - return -1; -} - static struct passwd *getpw_str(const char *username, size_t len) { struct passwd *pw; diff --git a/path.h b/path.h index ea96487b29..c3bc8617bd 100644 --- a/path.h +++ b/path.h @@ -173,7 +173,6 @@ const char *git_path_fetch_head(struct repository *r); const char *git_path_shallow(struct repository *r); int ends_with_path_components(const char *path, const char *components); -int validate_headref(const char *ref); int calc_shared_perm(int mode); int adjust_shared_perm(const char *path); diff --git a/setup.c b/setup.c index f4b32f76e3..7c996659bd 100644 --- a/setup.c +++ b/setup.c @@ -4,6 +4,7 @@ #include "environment.h" #include "exec-cmd.h" #include "gettext.h" +#include "hex.h" #include "object-name.h" #include "refs.h" #include "repository.h" @@ -341,6 +342,58 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir) return ret; } +static int validate_headref(const char *path) +{ + struct stat st; + char buffer[256]; + const char *refname; + struct object_id oid; + int fd; + ssize_t len; + + if (lstat(path, &st) < 0) + return -1; + + /* Make sure it is a "refs/.." symlink */ + if (S_ISLNK(st.st_mode)) { + len = readlink(path, buffer, sizeof(buffer)-1); + if (len >= 5 && !memcmp("refs/", buffer, 5)) + return 0; + return -1; + } + + /* + * Anything else, just open it and try to see if it is a symbolic ref. + */ + fd = open(path, O_RDONLY); + if (fd < 0) + return -1; + len = read_in_full(fd, buffer, sizeof(buffer)-1); + close(fd); + + if (len < 0) + return -1; + buffer[len] = '\0'; + + /* + * Is it a symbolic ref? + */ + if (skip_prefix(buffer, "ref:", &refname)) { + while (isspace(*refname)) + refname++; + if (starts_with(refname, "refs/")) + return 0; + } + + /* + * Is this a detached HEAD? + */ + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) + return 0; + + return -1; +} + /* * Test if it looks like we're at a git directory. * We want to see: -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 04/13] attr: don't recompute default attribute source Patrick Steinhardt ` (9 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2445 bytes --] The `OPT__ABBREV()` option can be used to add an option that abbreviates object IDs. When given a length longer than `the_hash_algo->hexsz`, then it will instead set the length to that maximum length. It may not always be guaranteed that we have `the_hash_algo` initialized properly as the hash algorithm can only be set up after we have set up `the_repository`. In that case, the hash would always be truncated to the hex length of SHA1, which may not be what the user desires. In practice it's not a problem as all commands that use `OPT__ABBREV()` also have `RUN_SETUP` set and thus cannot work without a repository. Consequently, both `the_repository` and `the_hash_algo` would be properly set up. Regardless of that, harden the code to not truncate the length when we didn't set up a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- parse-options-cb.c | 3 ++- t/t0040-parse-options.sh | 17 +++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index bdc7fae497..d99d688d3c 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -7,6 +7,7 @@ #include "environment.h" #include "gettext.h" #include "object-name.h" +#include "setup.h" #include "string-list.h" #include "strvec.h" #include "oid-array.h" @@ -29,7 +30,7 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) opt->long_name); if (v && v < MINIMUM_ABBREV) v = MINIMUM_ABBREV; - else if (v > the_hash_algo->hexsz) + else if (startup_info->have_repository && v > the_hash_algo->hexsz) v = the_hash_algo->hexsz; } *(int *)(opt->value) = v; diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh index 8bb2a8b453..45a773642f 100755 --- a/t/t0040-parse-options.sh +++ b/t/t0040-parse-options.sh @@ -176,6 +176,23 @@ test_expect_success 'long options' ' test_cmp expect output ' +test_expect_success 'abbreviate to something longer than SHA1 length' ' + cat >expect <<-EOF && + boolean: 0 + integer: 0 + magnitude: 0 + timestamp: 0 + string: (not set) + abbrev: 100 + verbose: -1 + quiet: 0 + dry run: no + file: (not set) + EOF + test-tool parse-options --abbrev=100 >output && + test_cmp expect output +' + test_expect_success 'missing required value' ' cat >expect <<-\EOF && error: switch `s'\'' requires a value -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 04/13] attr: don't recompute default attribute source 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (2 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt ` (8 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2887 bytes --] The `default_attr_source()` function lazily computes the attr source supposedly once, only. This is done via a static variable `attr_source` that contains the resolved object ID of the attr source's tree. If the variable is the null object ID then we try to look up the attr source, otherwise we skip over it. This approach is flawed though: the variable will never be set to anything else but the null object ID in case there is no attr source. Consequently, we re-compute the information on every call. And in the worst case, when we silently ignore bad trees, this will cause us to try and look up the treeish every single time. Improve this by introducing a separate variable `has_attr_source` to track whether we already computed the attr source and, if so, whether we have an attr source or not. This also allows us to convert the `ignore_bad_attr_tree` to not be static anymore as the code will only be executed once anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/attr.c b/attr.c index 679e42258c..9d911aeb31 100644 --- a/attr.c +++ b/attr.c @@ -1206,15 +1206,16 @@ static void collect_some_attrs(struct index_state *istate, } static const char *default_attr_source_tree_object_name; -static int ignore_bad_attr_tree; void set_git_attr_source(const char *tree_object_name) { default_attr_source_tree_object_name = xstrdup(tree_object_name); } -static void compute_default_attr_source(struct object_id *attr_source) +static int compute_default_attr_source(struct object_id *attr_source) { + int ignore_bad_attr_tree = 0; + if (!default_attr_source_tree_object_name) default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT); @@ -1230,22 +1231,28 @@ static void compute_default_attr_source(struct object_id *attr_source) ignore_bad_attr_tree = 1; } - if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) - return; + if (!default_attr_source_tree_object_name) + return 0; if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, - attr_source) && !ignore_bad_attr_tree) - die(_("bad --attr-source or GIT_ATTR_SOURCE")); + attr_source)) { + if (!ignore_bad_attr_tree) + die(_("bad --attr-source or GIT_ATTR_SOURCE")); + return 0; + } + + return 1; } static struct object_id *default_attr_source(void) { static struct object_id attr_source; + static int has_attr_source = -1; - if (is_null_oid(&attr_source)) - compute_default_attr_source(&attr_source); - if (is_null_oid(&attr_source)) + if (has_attr_source < 0) + has_attr_source = compute_default_attr_source(&attr_source); + if (!has_attr_source) return NULL; return &attr_source; } -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (3 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 04/13] attr: don't recompute default attribute source Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt ` (7 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2067 bytes --] If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are set, then `compute_default_attr_source()` will try to look up the value as a treeish. It is possible to hit that function while outside of a Git repository though, for example when using `git grep --no-index`. In that case, Git will hit a bug because we try to look up the main ref store outside of a repository. Handle the case gracefully and detect when we try to look up an attr source without a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 6 ++++++ t/t0003-attributes.sh | 15 +++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/attr.c b/attr.c index 9d911aeb31..4bd09bcb4b 100644 --- a/attr.c +++ b/attr.c @@ -1234,6 +1234,12 @@ static int compute_default_attr_source(struct object_id *attr_source) if (!default_attr_source_tree_object_name) return 0; + if (!startup_info->have_repository) { + if (!ignore_bad_attr_tree) + die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo")); + return 0; + } + if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, attr_source)) { diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh index 774b52c298..3efdec54dd 100755 --- a/t/t0003-attributes.sh +++ b/t/t0003-attributes.sh @@ -428,6 +428,21 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre ) ' +test_expect_success 'diff without repository with attr source' ' + mkdir -p "$TRASH_DIRECTORY/outside/nongit" && + ( + cd "$TRASH_DIRECTORY/outside/nongit" && + GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" && + export GIT_CEILING_DIRECTORIES && + touch file && + cat >expect <<-EOF && + fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo + EOF + test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err && + test_cmp expect err + ) +' + test_expect_success 'bare repository: with --source' ' ( cd bare.git && -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (4 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt ` (6 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3673 bytes --] The dumb HTTP transport tries to read the remote HEAD reference by downloading the "HEAD" file and then parsing it via `http_fetch_ref()`. This function will either parse the file as an object ID in case it is exactly `the_hash_algo->hexsz` long, or otherwise it will check whether the reference starts with "ref :" and parse it as a symbolic ref. This is broken when parsing detached HEADs of a remote SHA256 repository because we never update `the_hash_algo` to the discovered remote object hash. Consequently, `the_hash_algo` will always be the fallback SHA1 hash algorithm, which will cause us to fail parsing HEAD altogteher when it contains a SHA256 object ID. Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`. While at it, let's make the expected SHA1 fallback explicit in our code, which also addresses an upcoming issue where we are going to remove the SHA1 fallback for `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- remote-curl.c | 19 ++++++++++++++++++- t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/remote-curl.c b/remote-curl.c index 0b6d7815fd..004b707fdf 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -266,12 +266,23 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push) return list; } +/* + * Try to detect the hash algorithm used by the remote repository when using + * the dumb HTTP transport. As dumb transports cannot tell us the object hash + * directly have to derive it from the advertised ref lengths. + */ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads) { const char *p = memchr(heads->buf, '\t', heads->len); int algo; + + /* + * In case the remote has no refs we have no way to reliably determine + * the object hash used by that repository. In that case we simply fall + * back to SHA1, which may or may not be correct. + */ if (!p) - return the_hash_algo; + return &hash_algos[GIT_HASH_SHA1]; algo = hash_algo_by_length((p - heads->buf) / 2); if (algo == GIT_HASH_UNKNOWN) @@ -295,6 +306,12 @@ static struct ref *parse_info_refs(struct discovery *heads) "is this a git repository?", transport_anonymize_url(url.buf)); + /* + * Set the repository's hash algo to whatever we have just detected. + * This ensures that we can correctly parse the remote references. + */ + repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo)); + data = heads->buf; start = NULL; mid = data; diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 4c3b32785d..5f16cbc58d 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -55,6 +55,21 @@ test_expect_success 'list refs from outside any repository' ' test_cmp expect actual ' + +test_expect_success 'list detached HEAD from outside any repository' ' + git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ + "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \ + update-ref --no-deref HEAD refs/heads/main && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info && + cat >expect <<-EOF && + $(git rev-parse main) HEAD + $(git rev-parse main) refs/heads/main + EOF + nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual && + test_cmp expect actual +' + test_expect_success 'create password-protected repository' ' mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" && cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (5 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt ` (5 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1997 bytes --] The `--short=` option for git-rev-parse(1) allows the user to specify to how many characters object IDs should be shortened to. The option is broken though for SHA256 repositories because we set the maximum allowed hash size to `the_hash_algo->hexsz` before we have even set up the repo. Consequently, `the_hash_algo` will always be SHA1 and thus we truncate every hash after at most 40 characters. Fix this by accessing `the_hash_algo` only after we have set up the repo. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/rev-parse.c | 5 ++--- t/t1500-rev-parse.sh | 6 ++++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 624182e507..d7b87c605e 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -687,7 +687,6 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) const char *name = NULL; struct object_context unused; struct strbuf buf = STRBUF_INIT; - const int hexsz = the_hash_algo->hexsz; int seen_end_of_options = 0; enum format_type format = FORMAT_DEFAULT; @@ -863,8 +862,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) abbrev = strtoul(arg, NULL, 10); if (abbrev < MINIMUM_ABBREV) abbrev = MINIMUM_ABBREV; - else if (hexsz <= abbrev) - abbrev = hexsz; + else if ((int)the_hash_algo->hexsz <= abbrev) + abbrev = the_hash_algo->hexsz; continue; } if (!strcmp(arg, "--sq")) { diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index a669e592f1..30c31918fd 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -304,4 +304,10 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' ' test_cmp expect actual ' +test_expect_success '--short= truncates to the actual hash length' ' + git rev-parse HEAD >expect && + git rev-parse --short=100 HEAD >actual && + test_cmp expect actual +' + test_done -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (6 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt ` (4 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2003 bytes --] We access `the_hash_algo` in git-blame(1) before we have executed `parse_options_start()`, which may not be properly set up in case we have no repository. This is fine for most of the part because all the call paths that lead to it (git-blame(1), git-annotate(1) as well as git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository. There is one exception though, namely when passing `-h` to print the help. Here we will access `the_hash_algo` even if there is no repo. This works fine right now because `the_hash_algo` gets sets up to point to the SHA1 algorithm via `initialize_repository()`. But we're about to stop doing this, and thus the code would lead to a `NULL` pointer exception. Prepare the code for this and only access `the_hash_algo` after we are sure that there is a proper repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/blame.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 9aa74680a3..e325825936 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -915,7 +915,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix) struct range_set ranges; unsigned int range_i; long anchor; - const int hexsz = the_hash_algo->hexsz; long num_lines = 0; const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage; const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage; @@ -973,11 +972,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix) } else if (show_progress < 0) show_progress = isatty(2); - if (0 < abbrev && abbrev < hexsz) + if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz) /* one more abbrev length is needed for the boundary commit */ abbrev++; else if (!abbrev) - abbrev = hexsz; + abbrev = the_hash_algo->hexsz; if (revs_file && read_ancestry(revs_file)) die_errno("reading graft file '%s' failed", revs_file); -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (7 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt ` (3 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1528 bytes --] Verifying a bundle requires us to have a repository. This is encoded in `verify_bundle()`, which will return an error if there is no repository. We call `open_bundle()` before we call `verify_bundle()` though, which already performs some verifications even though we may ultimately abort due to a missing repository. This is problematic because `open_bundle()` already reads the bundle header and verifies that it contains a properly formatted hash. When there is no repository we have no clue what hash function to expect though, so we always end up assuming SHA1 here, which may or may not be correct. Furthermore, we are about to stop initializing `the_hash_algo` when there is no repository, which will lead to segfaults. Check early on whether we have a repository to fix this issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/bundle.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/bundle.c b/builtin/bundle.c index 3ad11dc5d0..d5d41a8f67 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -140,6 +140,11 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { builtin_bundle_verify_usage, options, &bundle_file); /* bundle internals use argv[1] as further parameters */ + if (!startup_info->have_repository) { + ret = error(_("need a repository to verify a bundle")); + goto cleanup; + } + if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) { ret = 1; goto cleanup; -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (8 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt @ 2024-04-29 6:34 ` Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt ` (2 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1849 bytes --] The git-diff(1) command can be used outside repositories to diff two files with each other. But even if there is no repository we will end up hashing the files that we are diffing so that we can print the "index" line: ``` diff --git a/a b/b index 7898192..6178079 100644 --- a/a +++ b/b @@ -1 +1 @@ -a +b ``` We implicitly use SHA1 to calculate the hash here, which is because `the_repository` gets initialized with SHA1 during the startup routine. We are about to stop doing this though such that `the_repository` only ever has a hash function when it was properly initialized via a repo's configuration. To give full control to our users, we would ideally add a new switch to git-diff(1) that allows them to specify the hash function when executed outside of a repository. But for now, we only convert the code to make this explicit such that we can stop setting the default hash algorithm for `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/diff.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/builtin/diff.c b/builtin/diff.c index 6e196e0c7d..58ec7e5da2 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix) no_index = DIFF_NO_INDEX_IMPLICIT; } + /* + * When operating outside of a Git repository we need to have a hash + * algorithm at hand so that we can generate the blob hashes. We + * default to SHA1 here, but may eventually want to change this to be + * configurable via a command line option. + */ + if (nongit) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + init_diff_ui_defaults(); git_config(git_diff_ui_config, NULL); prefix = precompose_argv_prefix(argc, argv, prefix); -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (9 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt @ 2024-04-29 6:35 ` Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:35 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1300 bytes --] It is possible to run git-shortlog(1) outside of a repository by passing it output from git-log(1) via standard input. Obviously, as there is no repository in that context, it is thus unsupported to pass any revisions as arguments. Regardless of that we still end up calling `setup_revisions()`. While that works alright, it is somewhat strange. Furthermore, this is about to cause problems when we unset the default object hash. Refactor the code to only call `setup_revisions()` when we have a repository. This is safe to do as we already verify that there are no arguments when running outside of a repository anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/shortlog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3c7cd2d6ef..d4daf31e22 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -435,7 +435,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) usage_with_options(shortlog_usage, options); } - if (setup_revisions(argc, argv, &rev, NULL) != 1) { + if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) { error(_("unrecognized argument: %s"), argv[1]); usage_with_options(shortlog_usage, options); } -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (10 preceding siblings ...) 2024-04-29 6:35 ` [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt @ 2024-04-29 6:35 ` Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:35 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1117 bytes --] Our fuzzing setups don't work in a proper repository, but only use the in-memory configured `the_repository`. Consequently, we never go through the full repository setup procedures and thus do not set up the hash algo used by the repository. The commit-graph fuzzer does rely on a properly initialized hash algo though. Initialize it explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- oss-fuzz/fuzz-commit-graph.c | 1 + 1 file changed, 1 insertion(+) diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c index 2992079dd9..94ecbb9242 100644 --- a/oss-fuzz/fuzz-commit-graph.c +++ b/oss-fuzz/fuzz-commit-graph.c @@ -18,6 +18,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) * touching the disk to keep the individual fuzz-test cases as fast as * possible. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); the_repository->settings.commit_graph_generation_version = 2; the_repository->settings.commit_graph_read_changed_paths = 1; g = parse_commit_graph(&the_repository->settings, (void *)data, size); -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt ` (11 preceding siblings ...) 2024-04-29 6:35 ` [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt @ 2024-04-29 6:35 ` Patrick Steinhardt 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-04-29 6:35 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1696 bytes --] During the startup of Git, we call `initialize_the_repository()` to set up `the_repository` as well as `the_index`. Part of this setup is also to set the default object hash of the repository to SHA1. This has the effect that `the_hash_algo` is getting initialized to SHA1, as well. This default hash algorithm eventually gets overridden by most Git commands via `setup_git_directory()`, which also detects the actual hash algorithm used by the repository. There are some commands though that don't access a repository at all, or at a later point only, and thus retain the default hash function for some amount of time. As some of the the preceding commits demonstrate, this can lead to subtle issues when we access `the_hash_algo` when no repository has been set up. Address this issue by dropping the set up of the default hash algorithm completely. The effect of this is that `the_hash_algo` will map to a `NULL` pointer and thus cause Git to crash when something tries to access the hash algorithm without it being properly initialized. It thus forces all Git commands to explicitly set up the hash algorithm in case there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- repository.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/repository.c b/repository.c index e15b416944..b65b1a8c8b 100644 --- a/repository.c +++ b/repository.c @@ -35,8 +35,6 @@ void initialize_the_repository(void) the_repo.parsed_objects = parsed_object_pool_new(); index_state_init(&the_index, the_repository); - - repo_set_hash_algo(&the_repo, GIT_HASH_SHA1); } static void expand_base_dir(char **out, const char *in, -- 2.45.0-rc1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (13 preceding siblings ...) 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt @ 2024-05-07 4:52 ` Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt ` (12 more replies) 14 siblings, 13 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:52 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 5446 bytes --] Hi, this is the third version of my patch series that stops relying on the SHA1 fallback configured for `the_hash_algo`. Instead, any callers that access `the_hash_algo` without it being initialized will now crash hard. This is designed to catch various subtle bugs going forward. The only change compared to v3 of this patch series is a rebase on top of the following two topics: - ps/the-index-is-no-more, which causes a conflict in the final patch. - jc/no-default-attr-tree-in-bare, which causes a conflict in the fourth patch. Interestingly, the range diff does not surface the second resolved conflict. This is because the context is actually the same, but when applying the patch in question you would get a conflict. To aid with this I decided to generate patches with `--unified=4`, which surfaces the previously-conflicting hunk in `compute_default_attr_source()`. Patrick Patrick Steinhardt (13): path: harden validation of HEAD with non-standard hashes path: move `validate_headref()` to its only user parse-options-cb: only abbreviate hashes when hash algo is known attr: don't recompute default attribute source attr: fix BUG() when parsing attrs outside of repo remote-curl: fix parsing of detached SHA256 heads builtin/rev-parse: allow shortening to more than 40 hex characters builtin/blame: don't access potentially unitialized `the_hash_algo` builtin/bundle: abort "verify" early when there is no repository builtin/diff: explicitly set hash algo when there is no repo builtin/shortlog: don't set up revisions without repo oss-fuzz/commit-graph: set up hash algorithm repository: stop setting SHA1 as the default object hash attr.c | 31 +++++++++++++++------ builtin/blame.c | 5 ++-- builtin/bundle.c | 5 ++++ builtin/diff.c | 9 ++++++ builtin/rev-parse.c | 5 ++-- builtin/shortlog.c | 2 +- oss-fuzz/fuzz-commit-graph.c | 1 + parse-options-cb.c | 3 +- path.c | 53 ------------------------------------ path.h | 1 - remote-curl.c | 19 ++++++++++++- repository.c | 20 -------------- setup.c | 53 ++++++++++++++++++++++++++++++++++++ t/t0003-attributes.sh | 15 ++++++++++ t/t0040-parse-options.sh | 17 ++++++++++++ t/t1500-rev-parse.sh | 6 ++++ t/t5550-http-fetch-dumb.sh | 15 ++++++++++ 17 files changed, 168 insertions(+), 92 deletions(-) Range-diff against v3: 1: 5134f35cda = 1: 5ee372c2d8 path: harden validation of HEAD with non-standard hashes 2: 589b6a99ef = 2: ece0ab94a8 path: move `validate_headref()` to its only user 3: 9a63c445d2 = 3: 1a0859eaf1 parse-options-cb: only abbreviate hashes when hash algo is known 4: 929bacbfce = 4: 1204a34216 attr: don't recompute default attribute source 5: 8f20aec1ee = 5: f098ce9690 attr: fix BUG() when parsing attrs outside of repo 6: 53439067a1 = 6: e8876052ad remote-curl: fix parsing of detached SHA256 heads 7: 1f74960760 = 7: 6116030310 builtin/rev-parse: allow shortening to more than 40 hex characters 8: 2d985abca1 = 8: 872ded113e builtin/blame: don't access potentially unitialized `the_hash_algo` 9: f3b23d28aa = 9: 5b4a21d2ce builtin/bundle: abort "verify" early when there is no repository 10: 7577b6b96c = 10: e7254ae757 builtin/diff: explicitly set hash algo when there is no repo 11: 509c79d1d3 = 11: c21b15ba17 builtin/shortlog: don't set up revisions without repo 12: 660f976129 = 12: f37beb0e89 oss-fuzz/commit-graph: set up hash algorithm 13: 95909c2da5 ! 13: 950b08bc78 repository: stop setting SHA1 as the default object hash @@ Commit message Signed-off-by: Patrick Steinhardt <ps@pks.im> ## repository.c ## -@@ repository.c: void initialize_the_repository(void) - the_repo.parsed_objects = parsed_object_pool_new(); - - index_state_init(&the_index, the_repository); +@@ repository.c: void initialize_repository(struct repository *repo) + repo->parsed_objects = parsed_object_pool_new(); + ALLOC_ARRAY(repo->index, 1); + index_state_init(repo->index, repo); - -- repo_set_hash_algo(&the_repo, GIT_HASH_SHA1); +- /* +- * Unfortunately, we need to keep this hack around for the time being: +- * +- * - Not setting up the hash algorithm for `the_repository` leads to +- * crashes because `the_hash_algo` is a macro that expands to +- * `the_repository->hash_algo`. So if Git commands try to access +- * `the_hash_algo` without a Git directory we crash. +- * +- * - Setting up the hash algorithm to be SHA1 by default breaks other +- * commands when running with SHA256. +- * +- * This is another point in case why having global state is a bad idea. +- * Eventually, we should remove this hack and stop setting the hash +- * algorithm in this function altogether. Instead, it should only ever +- * be set via our repository setup procedures. But that requires more +- * work. +- */ +- if (repo == the_repository) +- repo_set_hash_algo(repo, GIT_HASH_SHA1); } static void expand_base_dir(char **out, const char *in, -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v4 01/13] path: harden validation of HEAD with non-standard hashes 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt @ 2024-05-07 4:52 ` Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt ` (11 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:52 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2282 bytes --] The `validate_headref()` function takes a path to a supposed "HEAD" file and checks whether its format is something that we understand. It is used as part of our repository discovery to check whether a specific directory is a Git directory or not. Part of the validation is a check for a detached HEAD that contains a plain object ID. To do this validation we use `get_oid_hex()`, which relies on `the_hash_algo`. At this point in time the hash algo cannot yet be initialized though because we didn't yet read the Git config. Consequently, it will always be the SHA1 hash algorithm. In practice this works alright because `get_oid_hex()` only ends up checking whether the prefix of the buffer is a valid object ID. And because SHA1 is shorter than SHA256, the function will successfully parse SHA256 object IDs, as well. It is somewhat fragile though and not really the intent to only check for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` to check whether the "HEAD" file parses as any known hash. One might be hard pressed to tighten the check even further and fully validate the file contents, not only the prefix. In practice though that wouldn't make a lot of sense as it could be that the repository uses a hash function that produces longer hashes than SHA256, but which the current version of Git doesn't understand yet. We'd still want to detect the repository as proper Git repository in that case, and we will fail eventually with a proper error message that the hash isn't understood when trying to set up the repository format. It follows that we could just leave the current code intact, as in practice the code change doesn't have any user visible impact. But it also prepares us for `the_hash_algo` being unset when there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index 67229edb9c..cc02165530 100644 --- a/path.c +++ b/path.c @@ -692,9 +692,9 @@ int validate_headref(const char *path) /* * Is this a detached HEAD? */ - if (!get_oid_hex(buffer, &oid)) + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) return 0; return -1; } -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 02/13] path: move `validate_headref()` to its only user 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt @ 2024-05-07 4:52 ` Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt ` (10 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:52 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 4235 bytes --] While `validate_headref()` is only called from `is_git_directory()` in "setup.c", it is currently implemented in "path.c". Move it over such that it becomes clear that it is only really used during setup in order to discover repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- path.c | 53 ----------------------------------------------------- path.h | 1 - setup.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 54 deletions(-) diff --git a/path.c b/path.c index cc02165530..bd6e25245d 100644 --- a/path.c +++ b/path.c @@ -4,9 +4,8 @@ #include "git-compat-util.h" #include "abspath.h" #include "environment.h" #include "gettext.h" -#include "hex.h" #include "repository.h" #include "strbuf.h" #include "string-list.h" #include "dir.h" @@ -646,60 +645,8 @@ void strbuf_git_common_path(struct strbuf *sb, do_git_common_path(repo, sb, fmt, args); va_end(args); } -int validate_headref(const char *path) -{ - struct stat st; - char buffer[256]; - const char *refname; - struct object_id oid; - int fd; - ssize_t len; - - if (lstat(path, &st) < 0) - return -1; - - /* Make sure it is a "refs/.." symlink */ - if (S_ISLNK(st.st_mode)) { - len = readlink(path, buffer, sizeof(buffer)-1); - if (len >= 5 && !memcmp("refs/", buffer, 5)) - return 0; - return -1; - } - - /* - * Anything else, just open it and try to see if it is a symbolic ref. - */ - fd = open(path, O_RDONLY); - if (fd < 0) - return -1; - len = read_in_full(fd, buffer, sizeof(buffer)-1); - close(fd); - - if (len < 0) - return -1; - buffer[len] = '\0'; - - /* - * Is it a symbolic ref? - */ - if (skip_prefix(buffer, "ref:", &refname)) { - while (isspace(*refname)) - refname++; - if (starts_with(refname, "refs/")) - return 0; - } - - /* - * Is this a detached HEAD? - */ - if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) - return 0; - - return -1; -} - static struct passwd *getpw_str(const char *username, size_t len) { struct passwd *pw; char *username_z = xmemdupz(username, len); diff --git a/path.h b/path.h index ea96487b29..c3bc8617bd 100644 --- a/path.h +++ b/path.h @@ -172,9 +172,8 @@ const char *git_path_merge_head(struct repository *r); const char *git_path_fetch_head(struct repository *r); const char *git_path_shallow(struct repository *r); int ends_with_path_components(const char *path, const char *components); -int validate_headref(const char *ref); int calc_shared_perm(int mode); int adjust_shared_perm(const char *path); diff --git a/setup.c b/setup.c index f4b32f76e3..7c996659bd 100644 --- a/setup.c +++ b/setup.c @@ -3,8 +3,9 @@ #include "copy.h" #include "environment.h" #include "exec-cmd.h" #include "gettext.h" +#include "hex.h" #include "object-name.h" #include "refs.h" #include "repository.h" #include "config.h" @@ -340,8 +341,60 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir) strbuf_release(&path); return ret; } +static int validate_headref(const char *path) +{ + struct stat st; + char buffer[256]; + const char *refname; + struct object_id oid; + int fd; + ssize_t len; + + if (lstat(path, &st) < 0) + return -1; + + /* Make sure it is a "refs/.." symlink */ + if (S_ISLNK(st.st_mode)) { + len = readlink(path, buffer, sizeof(buffer)-1); + if (len >= 5 && !memcmp("refs/", buffer, 5)) + return 0; + return -1; + } + + /* + * Anything else, just open it and try to see if it is a symbolic ref. + */ + fd = open(path, O_RDONLY); + if (fd < 0) + return -1; + len = read_in_full(fd, buffer, sizeof(buffer)-1); + close(fd); + + if (len < 0) + return -1; + buffer[len] = '\0'; + + /* + * Is it a symbolic ref? + */ + if (skip_prefix(buffer, "ref:", &refname)) { + while (isspace(*refname)) + refname++; + if (starts_with(refname, "refs/")) + return 0; + } + + /* + * Is this a detached HEAD? + */ + if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN) + return 0; + + return -1; +} + /* * Test if it looks like we're at a git directory. * We want to see: * -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 03/13] parse-options-cb: only abbreviate hashes when hash algo is known 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt @ 2024-05-07 4:52 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 04/13] attr: don't recompute default attribute source Patrick Steinhardt ` (9 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:52 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2582 bytes --] The `OPT__ABBREV()` option can be used to add an option that abbreviates object IDs. When given a length longer than `the_hash_algo->hexsz`, then it will instead set the length to that maximum length. It may not always be guaranteed that we have `the_hash_algo` initialized properly as the hash algorithm can only be set up after we have set up `the_repository`. In that case, the hash would always be truncated to the hex length of SHA1, which may not be what the user desires. In practice it's not a problem as all commands that use `OPT__ABBREV()` also have `RUN_SETUP` set and thus cannot work without a repository. Consequently, both `the_repository` and `the_hash_algo` would be properly set up. Regardless of that, harden the code to not truncate the length when we didn't set up a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- parse-options-cb.c | 3 ++- t/t0040-parse-options.sh | 17 +++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index bdc7fae497..d99d688d3c 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -6,8 +6,9 @@ #include "date.h" #include "environment.h" #include "gettext.h" #include "object-name.h" +#include "setup.h" #include "string-list.h" #include "strvec.h" #include "oid-array.h" @@ -28,9 +29,9 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) return error(_("option `%s' expects a numerical value"), opt->long_name); if (v && v < MINIMUM_ABBREV) v = MINIMUM_ABBREV; - else if (v > the_hash_algo->hexsz) + else if (startup_info->have_repository && v > the_hash_algo->hexsz) v = the_hash_algo->hexsz; } *(int *)(opt->value) = v; return 0; diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh index 8bb2a8b453..45a773642f 100755 --- a/t/t0040-parse-options.sh +++ b/t/t0040-parse-options.sh @@ -175,8 +175,25 @@ test_expect_success 'long options' ' test_must_be_empty output.err && test_cmp expect output ' +test_expect_success 'abbreviate to something longer than SHA1 length' ' + cat >expect <<-EOF && + boolean: 0 + integer: 0 + magnitude: 0 + timestamp: 0 + string: (not set) + abbrev: 100 + verbose: -1 + quiet: 0 + dry run: no + file: (not set) + EOF + test-tool parse-options --abbrev=100 >output && + test_cmp expect output +' + test_expect_success 'missing required value' ' cat >expect <<-\EOF && error: switch `s'\'' requires a value EOF -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 04/13] attr: don't recompute default attribute source 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (2 preceding siblings ...) 2024-05-07 4:52 ` [PATCH v4 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt ` (8 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2991 bytes --] The `default_attr_source()` function lazily computes the attr source supposedly once, only. This is done via a static variable `attr_source` that contains the resolved object ID of the attr source's tree. If the variable is the null object ID then we try to look up the attr source, otherwise we skip over it. This approach is flawed though: the variable will never be set to anything else but the null object ID in case there is no attr source. Consequently, we re-compute the information on every call. And in the worst case, when we silently ignore bad trees, this will cause us to try and look up the treeish every single time. Improve this by introducing a separate variable `has_attr_source` to track whether we already computed the attr source and, if so, whether we have an attr source or not. This also allows us to convert the `ignore_bad_attr_tree` to not be static anymore as the code will only be executed once anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/attr.c b/attr.c index 6af7151088..a5b717e4ce 100644 --- a/attr.c +++ b/attr.c @@ -1205,41 +1205,48 @@ static void collect_some_attrs(struct index_state *istate, fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem); } static const char *default_attr_source_tree_object_name; -static int ignore_bad_attr_tree; void set_git_attr_source(const char *tree_object_name) { default_attr_source_tree_object_name = xstrdup(tree_object_name); } -static void compute_default_attr_source(struct object_id *attr_source) +static int compute_default_attr_source(struct object_id *attr_source) { + int ignore_bad_attr_tree = 0; + if (!default_attr_source_tree_object_name) default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT); if (!default_attr_source_tree_object_name && git_attr_tree) { default_attr_source_tree_object_name = git_attr_tree; ignore_bad_attr_tree = 1; } - if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) - return; + if (!default_attr_source_tree_object_name) + return 0; if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, - attr_source) && !ignore_bad_attr_tree) - die(_("bad --attr-source or GIT_ATTR_SOURCE")); + attr_source)) { + if (!ignore_bad_attr_tree) + die(_("bad --attr-source or GIT_ATTR_SOURCE")); + return 0; + } + + return 1; } static struct object_id *default_attr_source(void) { static struct object_id attr_source; + static int has_attr_source = -1; - if (is_null_oid(&attr_source)) - compute_default_attr_source(&attr_source); - if (is_null_oid(&attr_source)) + if (has_attr_source < 0) + has_attr_source = compute_default_attr_source(&attr_source); + if (!has_attr_source) return NULL; return &attr_source; } -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 05/13] attr: fix BUG() when parsing attrs outside of repo 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (3 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 04/13] attr: don't recompute default attribute source Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt ` (7 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2167 bytes --] If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are set, then `compute_default_attr_source()` will try to look up the value as a treeish. It is possible to hit that function while outside of a Git repository though, for example when using `git grep --no-index`. In that case, Git will hit a bug because we try to look up the main ref store outside of a repository. Handle the case gracefully and detect when we try to look up an attr source without a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- attr.c | 6 ++++++ t/t0003-attributes.sh | 15 +++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/attr.c b/attr.c index a5b717e4ce..c89ab3478e 100644 --- a/attr.c +++ b/attr.c @@ -1226,8 +1226,14 @@ static int compute_default_attr_source(struct object_id *attr_source) if (!default_attr_source_tree_object_name) return 0; + if (!startup_info->have_repository) { + if (!ignore_bad_attr_tree) + die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo")); + return 0; + } + if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, attr_source)) { if (!ignore_bad_attr_tree) diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh index d755cc3c29..72fadca1e8 100755 --- a/t/t0003-attributes.sh +++ b/t/t0003-attributes.sh @@ -433,8 +433,23 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre test_cmp expect actual ) ' +test_expect_success 'diff without repository with attr source' ' + mkdir -p "$TRASH_DIRECTORY/outside/nongit" && + ( + cd "$TRASH_DIRECTORY/outside/nongit" && + GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" && + export GIT_CEILING_DIRECTORIES && + touch file && + cat >expect <<-EOF && + fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo + EOF + test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err && + test_cmp expect err + ) +' + test_expect_success 'bare repository: with --source' ' ( cd bare.git && attr_check_source foo/bar/f f tag-1 && -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 06/13] remote-curl: fix parsing of detached SHA256 heads 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (4 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt ` (6 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3906 bytes --] The dumb HTTP transport tries to read the remote HEAD reference by downloading the "HEAD" file and then parsing it via `http_fetch_ref()`. This function will either parse the file as an object ID in case it is exactly `the_hash_algo->hexsz` long, or otherwise it will check whether the reference starts with "ref :" and parse it as a symbolic ref. This is broken when parsing detached HEADs of a remote SHA256 repository because we never update `the_hash_algo` to the discovered remote object hash. Consequently, `the_hash_algo` will always be the fallback SHA1 hash algorithm, which will cause us to fail parsing HEAD altogteher when it contains a SHA256 object ID. Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`. While at it, let's make the expected SHA1 fallback explicit in our code, which also addresses an upcoming issue where we are going to remove the SHA1 fallback for `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- remote-curl.c | 19 ++++++++++++++++++- t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/remote-curl.c b/remote-curl.c index 0b6d7815fd..004b707fdf 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -265,14 +265,25 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push) return list; } +/* + * Try to detect the hash algorithm used by the remote repository when using + * the dumb HTTP transport. As dumb transports cannot tell us the object hash + * directly have to derive it from the advertised ref lengths. + */ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads) { const char *p = memchr(heads->buf, '\t', heads->len); int algo; + + /* + * In case the remote has no refs we have no way to reliably determine + * the object hash used by that repository. In that case we simply fall + * back to SHA1, which may or may not be correct. + */ if (!p) - return the_hash_algo; + return &hash_algos[GIT_HASH_SHA1]; algo = hash_algo_by_length((p - heads->buf) / 2); if (algo == GIT_HASH_UNKNOWN) return NULL; @@ -294,8 +305,14 @@ static struct ref *parse_info_refs(struct discovery *heads) die("%sinfo/refs not valid: could not determine hash algorithm; " "is this a git repository?", transport_anonymize_url(url.buf)); + /* + * Set the repository's hash algo to whatever we have just detected. + * This ensures that we can correctly parse the remote references. + */ + repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo)); + data = heads->buf; start = NULL; mid = data; while (i < heads->len) { diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 4c3b32785d..5f16cbc58d 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -54,8 +54,23 @@ test_expect_success 'list refs from outside any repository' ' nongit git ls-remote "$HTTPD_URL/dumb/repo.git" >actual && test_cmp expect actual ' + +test_expect_success 'list detached HEAD from outside any repository' ' + git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ + "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \ + update-ref --no-deref HEAD refs/heads/main && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info && + cat >expect <<-EOF && + $(git rev-parse main) HEAD + $(git rev-parse main) refs/heads/main + EOF + nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual && + test_cmp expect actual +' + test_expect_success 'create password-protected repository' ' mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" && cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/repo.git" -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (5 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt ` (5 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2169 bytes --] The `--short=` option for git-rev-parse(1) allows the user to specify to how many characters object IDs should be shortened to. The option is broken though for SHA256 repositories because we set the maximum allowed hash size to `the_hash_algo->hexsz` before we have even set up the repo. Consequently, `the_hash_algo` will always be SHA1 and thus we truncate every hash after at most 40 characters. Fix this by accessing `the_hash_algo` only after we have set up the repo. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/rev-parse.c | 5 ++--- t/t1500-rev-parse.sh | 6 ++++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index af79538632..7d10ee33c4 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -686,9 +686,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) unsigned int flags = 0; const char *name = NULL; struct object_context unused; struct strbuf buf = STRBUF_INIT; - const int hexsz = the_hash_algo->hexsz; int seen_end_of_options = 0; enum format_type format = FORMAT_DEFAULT; if (argc > 1 && !strcmp("--parseopt", argv[1])) @@ -862,10 +861,10 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) continue; abbrev = strtoul(arg, NULL, 10); if (abbrev < MINIMUM_ABBREV) abbrev = MINIMUM_ABBREV; - else if (hexsz <= abbrev) - abbrev = hexsz; + else if ((int)the_hash_algo->hexsz <= abbrev) + abbrev = the_hash_algo->hexsz; continue; } if (!strcmp(arg, "--sq")) { output_sq = 1; diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index a669e592f1..30c31918fd 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -303,5 +303,11 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' ' git rev-parse --symbolic-full-name --bisect >actual && test_cmp expect actual ' +test_expect_success '--short= truncates to the actual hash length' ' + git rev-parse HEAD >expect && + git rev-parse --short=100 HEAD >actual && + test_cmp expect actual +' + test_done -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (6 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt ` (4 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2083 bytes --] We access `the_hash_algo` in git-blame(1) before we have executed `parse_options_start()`, which may not be properly set up in case we have no repository. This is fine for most of the part because all the call paths that lead to it (git-blame(1), git-annotate(1) as well as git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository. There is one exception though, namely when passing `-h` to print the help. Here we will access `the_hash_algo` even if there is no repo. This works fine right now because `the_hash_algo` gets sets up to point to the SHA1 algorithm via `initialize_repository()`. But we're about to stop doing this, and thus the code would lead to a `NULL` pointer exception. Prepare the code for this and only access `the_hash_algo` after we are sure that there is a proper repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/blame.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 9aa74680a3..e325825936 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -914,9 +914,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix) int cmd_is_annotate = !strcmp(argv[0], "annotate"); struct range_set ranges; unsigned int range_i; long anchor; - const int hexsz = the_hash_algo->hexsz; long num_lines = 0; const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage; const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage; @@ -972,13 +971,13 @@ int cmd_blame(int argc, const char **argv, const char *prefix) show_progress = 0; } else if (show_progress < 0) show_progress = isatty(2); - if (0 < abbrev && abbrev < hexsz) + if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz) /* one more abbrev length is needed for the boundary commit */ abbrev++; else if (!abbrev) - abbrev = hexsz; + abbrev = the_hash_algo->hexsz; if (revs_file && read_ancestry(revs_file)) die_errno("reading graft file '%s' failed", revs_file); -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 09/13] builtin/bundle: abort "verify" early when there is no repository 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (7 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt ` (3 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1584 bytes --] Verifying a bundle requires us to have a repository. This is encoded in `verify_bundle()`, which will return an error if there is no repository. We call `open_bundle()` before we call `verify_bundle()` though, which already performs some verifications even though we may ultimately abort due to a missing repository. This is problematic because `open_bundle()` already reads the bundle header and verifies that it contains a properly formatted hash. When there is no repository we have no clue what hash function to expect though, so we always end up assuming SHA1 here, which may or may not be correct. Furthermore, we are about to stop initializing `the_hash_algo` when there is no repository, which will lead to segfaults. Check early on whether we have a repository to fix this issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/bundle.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/bundle.c b/builtin/bundle.c index 3ad11dc5d0..d5d41a8f67 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -139,8 +139,13 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { argc = parse_options_cmd_bundle(argc, argv, prefix, builtin_bundle_verify_usage, options, &bundle_file); /* bundle internals use argv[1] as further parameters */ + if (!startup_info->have_repository) { + ret = error(_("need a repository to verify a bundle")); + goto cleanup; + } + if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) { ret = 1; goto cleanup; } -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 10/13] builtin/diff: explicitly set hash algo when there is no repo 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (8 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt ` (2 subsequent siblings) 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1896 bytes --] The git-diff(1) command can be used outside repositories to diff two files with each other. But even if there is no repository we will end up hashing the files that we are diffing so that we can print the "index" line: ``` diff --git a/a b/b index 7898192..6178079 100644 --- a/a +++ b/b @@ -1 +1 @@ -a +b ``` We implicitly use SHA1 to calculate the hash here, which is because `the_repository` gets initialized with SHA1 during the startup routine. We are about to stop doing this though such that `the_repository` only ever has a hash function when it was properly initialized via a repo's configuration. To give full control to our users, we would ideally add a new switch to git-diff(1) that allows them to specify the hash function when executed outside of a repository. But for now, we only convert the code to make this explicit such that we can stop setting the default hash algorithm for `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/diff.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/builtin/diff.c b/builtin/diff.c index efc37483b3..9b6cdabe15 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -464,8 +464,17 @@ int cmd_diff(int argc, const char **argv, const char *prefix) !path_inside_repo(prefix, argv[i + 1])))) no_index = DIFF_NO_INDEX_IMPLICIT; } + /* + * When operating outside of a Git repository we need to have a hash + * algorithm at hand so that we can generate the blob hashes. We + * default to SHA1 here, but may eventually want to change this to be + * configurable via a command line option. + */ + if (nongit) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + init_diff_ui_defaults(); git_config(git_diff_ui_config, NULL); prefix = precompose_argv_prefix(argc, argv, prefix); -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 11/13] builtin/shortlog: don't set up revisions without repo 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (9 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1360 bytes --] It is possible to run git-shortlog(1) outside of a repository by passing it output from git-log(1) via standard input. Obviously, as there is no repository in that context, it is thus unsupported to pass any revisions as arguments. Regardless of that we still end up calling `setup_revisions()`. While that works alright, it is somewhat strange. Furthermore, this is about to cause problems when we unset the default object hash. Refactor the code to only call `setup_revisions()` when we have a repository. This is safe to do as we already verify that there are no arguments when running outside of a repository anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/shortlog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3c7cd2d6ef..d4daf31e22 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -434,9 +434,9 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) error(_("too many arguments given outside repository")); usage_with_options(shortlog_usage, options); } - if (setup_revisions(argc, argv, &rev, NULL) != 1) { + if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) { error(_("unrecognized argument: %s"), argv[1]); usage_with_options(shortlog_usage, options); } -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 12/13] oss-fuzz/commit-graph: set up hash algorithm 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (10 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] Our fuzzing setups don't work in a proper repository, but only use the in-memory configured `the_repository`. Consequently, we never go through the full repository setup procedures and thus do not set up the hash algo used by the repository. The commit-graph fuzzer does rely on a properly initialized hash algo though. Initialize it explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- oss-fuzz/fuzz-commit-graph.c | 1 + 1 file changed, 1 insertion(+) diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c index fe15e2c225..75e668a057 100644 --- a/oss-fuzz/fuzz-commit-graph.c +++ b/oss-fuzz/fuzz-commit-graph.c @@ -18,8 +18,9 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) * normally be read from the repository's gitdir. We want to avoid * touching the disk to keep the individual fuzz-test cases as fast as * possible. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); the_repository->settings.commit_graph_generation_version = 2; the_repository->settings.commit_graph_read_changed_paths = 1; g = parse_commit_graph(&the_repository->settings, (void *)data, size); repo_clear(the_repository); -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v4 13/13] repository: stop setting SHA1 as the default object hash 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt ` (11 preceding siblings ...) 2024-05-07 4:53 ` [PATCH v4 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt @ 2024-05-07 4:53 ` Patrick Steinhardt 12 siblings, 0 replies; 67+ messages in thread From: Patrick Steinhardt @ 2024-05-07 4:53 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2672 bytes --] During the startup of Git, we call `initialize_the_repository()` to set up `the_repository` as well as `the_index`. Part of this setup is also to set the default object hash of the repository to SHA1. This has the effect that `the_hash_algo` is getting initialized to SHA1, as well. This default hash algorithm eventually gets overridden by most Git commands via `setup_git_directory()`, which also detects the actual hash algorithm used by the repository. There are some commands though that don't access a repository at all, or at a later point only, and thus retain the default hash function for some amount of time. As some of the the preceding commits demonstrate, this can lead to subtle issues when we access `the_hash_algo` when no repository has been set up. Address this issue by dropping the set up of the default hash algorithm completely. The effect of this is that `the_hash_algo` will map to a `NULL` pointer and thus cause Git to crash when something tries to access the hash algorithm without it being properly initialized. It thus forces all Git commands to explicitly set up the hash algorithm in case there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- repository.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/repository.c b/repository.c index 2118f563e3..15c10015b0 100644 --- a/repository.c +++ b/repository.c @@ -25,28 +25,8 @@ void initialize_repository(struct repository *repo) repo->remote_state = remote_state_new(); repo->parsed_objects = parsed_object_pool_new(); ALLOC_ARRAY(repo->index, 1); index_state_init(repo->index, repo); - - /* - * Unfortunately, we need to keep this hack around for the time being: - * - * - Not setting up the hash algorithm for `the_repository` leads to - * crashes because `the_hash_algo` is a macro that expands to - * `the_repository->hash_algo`. So if Git commands try to access - * `the_hash_algo` without a Git directory we crash. - * - * - Setting up the hash algorithm to be SHA1 by default breaks other - * commands when running with SHA256. - * - * This is another point in case why having global state is a bad idea. - * Eventually, we should remove this hack and stop setting the hash - * algorithm in this function altogether. Instead, it should only ever - * be set via our repository setup procedures. But that requires more - * work. - */ - if (repo == the_repository) - repo_set_hash_algo(repo, GIT_HASH_SHA1); } static void expand_base_dir(char **out, const char *in, const char *base_dir, const char *def_in) -- 2.45.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 67+ messages in thread
end of thread, other threads:[~2024-05-07 4:53 UTC | newest] Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-19 9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-19 19:03 ` brian m. carlson 2024-04-22 4:56 ` Patrick Steinhardt 2024-04-22 16:15 ` Junio C Hamano 2024-04-23 4:50 ` Patrick Steinhardt 2024-04-23 16:54 ` Junio C Hamano 2024-04-19 9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt 2024-04-23 0:30 ` Justin Tobler 2024-04-19 9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt 2024-04-23 0:32 ` Justin Tobler 2024-04-19 9:51 ` [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt 2024-04-19 9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt 2024-04-22 18:41 ` Junio C Hamano 2024-04-19 9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt 2024-04-23 0:35 ` Justin Tobler 2024-04-19 9:51 ` [PATCH 11/11] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson 2024-04-19 19:16 ` Junio C Hamano 2024-04-22 4:56 ` Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 00/12] " Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 02/12] path: move `validate_headref()` to its only user Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 04/12] attr: don't recompute default attribute source Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt 2024-04-23 5:07 ` [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt 2024-04-23 5:08 ` [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 2024-04-27 22:09 ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano 2024-04-29 6:05 ` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 00/13] " Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 04/13] attr: don't recompute default attribute source Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt 2024-04-29 6:34 ` [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt 2024-04-29 6:35 ` [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 00/13] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt 2024-05-07 4:52 ` [PATCH v4 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 04/13] attr: don't recompute default attribute source Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt 2024-05-07 4:53 ` [PATCH v4 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).