All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] object-file: reprepare alternates when necessary
@ 2023-03-06 20:59 Derrick Stolee via GitGitGadget
  2023-03-06 22:54 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 20:59 UTC (permalink / raw)
  To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When an object is not found in a repository's object store, we sometimes
call reprepare_packed_git() to see if the object was temporarily moved
into a new pack-file (and its old pack-file or loose object was
deleted). This process does a scan of each pack directory within each
odb, but does not reevaluate if the odb list needs updating.

Create a new reprepare_alt_odb() method that is a similar wrapper around
prepare_alt_odb(). Call it from reprepare_packed_git() under the object
read lock to avoid readers from interacting with a potentially
incomplete odb being added to the odb list.

prepare_alt_odb() already avoids adding duplicate odbs to the list
during its progress, so it is safe to call it again from
reprepare_alt_odb() without worrying about duplicate odbs.

This change is specifically for concurrent changes to the repository, so
it is difficult to create a test that guarantees this behavior is
correct. I manually verified by introducing a reprepare_packed_git() call
into get_revision() and stepped into that call in a debugger with a
parent 'git log' process. Multiple runs of reprepare_alt_odb() kept
the_repository->objects->odb as a single-item chain until I added a
.git/objects/info/alternates file in a different process. The next run
added the new odb to the chain and subsequent runs did not add to the
chain.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    object-file: reprepare alternates when necessary
    
    This subtlety was notice by Michael Haggerty due to how alternates are
    used server-side at $DAYJOB. Moving pack-files from a repository to the
    alternate occasionally causes failures because processes that start
    before the alternate exists don't know how to find that alternate at
    run-time.
    
    Thanks,
    
     * Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1490%2Fderrickstolee%2Fstolee%2Freprepare-alternates-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1490/derrickstolee/stolee/reprepare-alternates-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1490

 object-file.c  | 6 ++++++
 object-store.h | 1 +
 packfile.c     | 1 +
 3 files changed, 8 insertions(+)

diff --git a/object-file.c b/object-file.c
index 939865c1ae0..22acc7fd8e9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -944,6 +944,12 @@ void prepare_alt_odb(struct repository *r)
 	r->objects->loaded_alternates = 1;
 }
 
+void reprepare_alt_odb(struct repository *r)
+{
+	r->objects->loaded_alternates = 0;
+	prepare_alt_odb(r);
+}
+
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
 static int freshen_file(const char *fn)
 {
diff --git a/object-store.h b/object-store.h
index 1a713d89d7c..750c29daa54 100644
--- a/object-store.h
+++ b/object-store.h
@@ -56,6 +56,7 @@ KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
 	struct object_directory *, 1, fspathhash, fspatheq)
 
 void prepare_alt_odb(struct repository *r);
+void reprepare_alt_odb(struct repository *r);
 char *compute_alternate_path(const char *path, struct strbuf *err);
 struct object_directory *find_odb(struct repository *r, const char *obj_dir);
 typedef int alt_odb_fn(struct object_directory *, void *);
diff --git a/packfile.c b/packfile.c
index 79e21ab18e7..2b28918a05e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1008,6 +1008,7 @@ void reprepare_packed_git(struct repository *r)
 	struct object_directory *odb;
 
 	obj_read_lock();
+	reprepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next)
 		odb_clear_loose_cache(odb);
 

base-commit: d15644fe0226af7ffc874572d968598564a230dd
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-06 20:59 [PATCH] object-file: reprepare alternates when necessary Derrick Stolee via GitGitGadget
@ 2023-03-06 22:54 ` Junio C Hamano
  2023-03-07  0:28   ` Taylor Blau
  2023-03-07 11:28 ` Ævar Arnfjörð Bjarmason
  2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2023-03-06 22:54 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -1008,6 +1008,7 @@ void reprepare_packed_git(struct repository *r)
>  	struct object_directory *odb;
>  
>  	obj_read_lock();
> +	reprepare_alt_odb(r);
>  	for (odb = r->objects->odb; odb; odb = odb->next)
>  		odb_clear_loose_cache(odb);

Hmph, if there was an old alternate ODB from which we took some
loose object from and cached, and if that ODB no longer is on the
updated alternate list, would we now fail to clear the loose objects
cache for the ODB?  Or are we only prepared for seeing "more"
alternates and assume no existing alternates go away?

Other than that, looking quite well reasoned.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-06 22:54 ` Junio C Hamano
@ 2023-03-07  0:28   ` Taylor Blau
  2023-03-07 14:52     ` Derrick Stolee
  0 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2023-03-07  0:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee

On Mon, Mar 06, 2023 at 02:54:00PM -0800, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -1008,6 +1008,7 @@ void reprepare_packed_git(struct repository *r)
> >  	struct object_directory *odb;
> >
> >  	obj_read_lock();
> > +	reprepare_alt_odb(r);
> >  	for (odb = r->objects->odb; odb; odb = odb->next)
> >  		odb_clear_loose_cache(odb);
>
> Hmph, if there was an old alternate ODB from which we took some
> loose object from and cached, and if that ODB no longer is on the
> updated alternate list, would we now fail to clear the loose objects
> cache for the ODB?  Or are we only prepared for seeing "more"
> alternates and assume no existing alternates go away?

Based on my understanding of the patch, we are only prepared to see
"more" alternates, rather than some existing alternate going away.

That being said, I am not certain that is how it works. Perhaps an
alternate "goes away", but does not actually get removed from the list
of alternate ODBs. If that's the case, any object lookup in that
now-missing ODB would fail, but any subsequent ODBs which were added
after calling reprepare_alt_odb() would succeed on that object lookup.

So, I don't know. I don't have the implementation details of the
alternates ODB mechanism paged in enough to say for sure. Hopefully
Stolee can point us in the right direction.

> Other than that, looking quite well reasoned.

Agreed.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-06 20:59 [PATCH] object-file: reprepare alternates when necessary Derrick Stolee via GitGitGadget
  2023-03-06 22:54 ` Junio C Hamano
@ 2023-03-07 11:28 ` Ævar Arnfjörð Bjarmason
  2023-03-07 17:29   ` Junio C Hamano
  2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  2 siblings, 1 reply; 19+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-07 11:28 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee


On Mon, Mar 06 2023, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When an object is not found in a repository's object store, we sometimes
> call reprepare_packed_git() to see if the object was temporarily moved
> into a new pack-file (and its old pack-file or loose object was
> deleted). This process does a scan of each pack directory within each
> odb, but does not reevaluate if the odb list needs updating.
>
> Create a new reprepare_alt_odb() method that is a similar wrapper around
> prepare_alt_odb(). Call it from reprepare_packed_git() under the object
> read lock to avoid readers from interacting with a potentially
> incomplete odb being added to the odb list.
>
> prepare_alt_odb() already avoids adding duplicate odbs to the list
> during its progress, so it is safe to call it again from
> reprepare_alt_odb() without worrying about duplicate odbs.
>
> This change is specifically for concurrent changes to the repository, so
> it is difficult to create a test that guarantees this behavior is
> correct. I manually verified by introducing a reprepare_packed_git() call
> into get_revision() and stepped into that call in a debugger with a
> parent 'git log' process. Multiple runs of reprepare_alt_odb() kept
> the_repository->objects->odb as a single-item chain until I added a
> .git/objects/info/alternates file in a different process. The next run
> added the new odb to the chain and subsequent runs did not add to the
> chain.

I found this a bit hard to read, as one migh think from just this
explanation that you're adding ODB locking as a new concept here, adding
"existing" in front of "read lock" above might help.

But in fact we've been doing the locking since 6c307626f1e (grep:
protect packed_git [re-]initialization, 2020-01-15). So the only thing
that really needs justification here is that putting the alternates
re-reading under the same lock

There is a really interesting potential caveat here which you're not
discussing, which is...

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>     object-file: reprepare alternates when necessary
>     
>     This subtlety was notice by Michael Haggerty due to how alternates are
>     used server-side at $DAYJOB. Moving pack-files from a repository to the
>     alternate occasionally causes failures because processes that start
>     before the alternate exists don't know how to find that alternate at
>     run-time.
>     
>     Thanks,
>     
>      * Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1490%2Fderrickstolee%2Fstolee%2Freprepare-alternates-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1490/derrickstolee/stolee/reprepare-alternates-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1490
>
>  object-file.c  | 6 ++++++
>  object-store.h | 1 +
>  packfile.c     | 1 +
>  3 files changed, 8 insertions(+)
>
> diff --git a/object-file.c b/object-file.c
> index 939865c1ae0..22acc7fd8e9 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -944,6 +944,12 @@ void prepare_alt_odb(struct repository *r)
>  	r->objects->loaded_alternates = 1;
>  }
>  
> +void reprepare_alt_odb(struct repository *r)
> +{
> +	r->objects->loaded_alternates = 0;
> +	prepare_alt_odb(r);
> +}
> +
>  /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
>  static int freshen_file(const char *fn)
>  {
> diff --git a/object-store.h b/object-store.h
> index 1a713d89d7c..750c29daa54 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -56,6 +56,7 @@ KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
>  	struct object_directory *, 1, fspathhash, fspatheq)
>  
>  void prepare_alt_odb(struct repository *r);
> +void reprepare_alt_odb(struct repository *r);
>  char *compute_alternate_path(const char *path, struct strbuf *err);
>  struct object_directory *find_odb(struct repository *r, const char *obj_dir);
>  typedef int alt_odb_fn(struct object_directory *, void *);
> diff --git a/packfile.c b/packfile.c
> index 79e21ab18e7..2b28918a05e 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1008,6 +1008,7 @@ void reprepare_packed_git(struct repository *r)
>  	struct object_directory *odb;
>  
>  	obj_read_lock();
> +	reprepare_alt_odb(r);
>  	for (odb = r->objects->odb; odb; odb = odb->next)
>  		odb_clear_loose_cache(odb);
>  
>
> base-commit: d15644fe0226af7ffc874572d968598564a230dd

This seems reasonable, but wouldn't this do the same without introducing
an API function just for this one use-case?

That's of course a nit, and you seem to have been adding this for
consistency with reprepare_packed_git(), but it already "owns" the
"approximate_object_count_valid" and "packed_git_initialized" flags in
"struct raw_object_store".

So as we'll only need this from reprepare_packed_git() isn't it better
to declare that "loaded_alternates" is another such flag?

Perhaps not, but as the resulting patch is much shorter it seems worth
considering.

...but to continue the above, the *really* important thing here (and
correct me if I'm wrong) is that we really need to *first* prepare the
alternates, and *then* do the rest, as our new alternates might point to
new loose objects and packs.

So with both of the above (the same could be done with your new helper)
something like this would IMO make that much clearer:

	diff --git a/packfile.c b/packfile.c
	index 79e21ab18e7..50cb46ca4b7 100644
	--- a/packfile.c
	+++ b/packfile.c
	@@ -1008,6 +1008,13 @@ void reprepare_packed_git(struct repository *r)
	 	struct object_directory *odb;
	 
	 	obj_read_lock();
	+	/*
	+	 * New alternates might point to new loose & pack dirs, so we
	+	 * must first read those.
	+	 */
	+	r->objects->loaded_alternates = 0;
	+	prepare_alt_odb(r);
	+
	 	for (odb = r->objects->odb; odb; odb = odb->next)
	 		odb_clear_loose_cache(odb);

And, I think this is an exsiting edge case, but we're only locking the
ODB of the "parent" repository in this case, so if we have alternates in
play aren't we going to racily compute the rest here, the loose objects
and packs of the alternates we're about to consult aren't under the same
lock?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-07  0:28   ` Taylor Blau
@ 2023-03-07 14:52     ` Derrick Stolee
  2023-03-07 17:16       ` Junio C Hamano
  2023-03-08 15:55       ` Taylor Blau
  0 siblings, 2 replies; 19+ messages in thread
From: Derrick Stolee @ 2023-03-07 14:52 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git

On 3/6/2023 7:28 PM, Taylor Blau wrote:
> On Mon, Mar 06, 2023 at 02:54:00PM -0800, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> @@ -1008,6 +1008,7 @@ void reprepare_packed_git(struct repository *r)
>>>  	struct object_directory *odb;
>>>
>>>  	obj_read_lock();
>>> +	reprepare_alt_odb(r);
>>>  	for (odb = r->objects->odb; odb; odb = odb->next)
>>>  		odb_clear_loose_cache(odb);
>>
>> Hmph, if there was an old alternate ODB from which we took some
>> loose object from and cached, and if that ODB no longer is on the
>> updated alternate list, would we now fail to clear the loose objects
>> cache for the ODB?  Or are we only prepared for seeing "more"
>> alternates and assume no existing alternates go away?
> 
> Based on my understanding of the patch, we are only prepared to see
> "more" alternates, rather than some existing alternate going away.
> 
> That being said, I am not certain that is how it works. Perhaps an
> alternate "goes away", but does not actually get removed from the list
> of alternate ODBs. If that's the case, any object lookup in that
> now-missing ODB would fail, but any subsequent ODBs which were added
> after calling reprepare_alt_odb() would succeed on that object lookup.
> 
> So, I don't know. I don't have the implementation details of the
> alternates ODB mechanism paged in enough to say for sure. Hopefully
> Stolee can point us in the right direction.

The prepare_alt_odb() call only _adds_ to the linked odb list. It
will not remove any existing ODBs. Adding this reprepare_*() method
makes it such that we can use the union of the alternates available
across the lifetime of the process.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-07 14:52     ` Derrick Stolee
@ 2023-03-07 17:16       ` Junio C Hamano
  2023-03-08 15:55       ` Taylor Blau
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2023-03-07 17:16 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Derrick Stolee via GitGitGadget, git

Derrick Stolee <derrickstolee@github.com> writes:

> The prepare_alt_odb() call only _adds_ to the linked odb list.

Ah, thanks.  That was what I missed and led to the question.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-07 11:28 ` Ævar Arnfjörð Bjarmason
@ 2023-03-07 17:29   ` Junio C Hamano
  2023-03-07 18:18     ` Junio C Hamano
  2023-03-08 13:29     ` Derrick Stolee
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2023-03-07 17:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But in fact we've been doing the locking since 6c307626f1e (grep:
> protect packed_git [re-]initialization, 2020-01-15). So the only thing
> that really needs justification here is that putting the alternates
> re-reading under the same lock
>
> There is a really interesting potential caveat here which you're not
> discussing, which is...
>> ...
>> +void reprepare_alt_odb(struct repository *r)
>> +{
>> +	r->objects->loaded_alternates = 0;
>> +	prepare_alt_odb(r);
>> +}
>> +
>> ...
> This seems reasonable, but wouldn't this do the same without introducing
> an API function just for this one use-case?
>
> That's of course a nit, and you seem to have been adding this for
> consistency with reprepare_packed_git(), but it already "owns" the
> "approximate_object_count_valid" and "packed_git_initialized" flags in
> "struct raw_object_store".
>
> So as we'll only need this from reprepare_packed_git() isn't it better
> to declare that "loaded_alternates" is another such flag?

I am not sure I got what you want to say 100%, but if you are saying
that this "drop the 'loaded' flag and force prepare_*() function to
redo its thing" must not be done only in reprepare_packed_git(), and
that inlining the code there without introducing a helper function
that anybody can casually call without thinking its consequenced
through, then I tend to agree in principle.  But it is just as easy
to lift two lines of code from the rewritten/inlined code to a new
place---to ensure people follow the obj_read_lock() rule, the
comment before it may have to be a bit stronger, I wonder?

> Perhaps not, but as the resulting patch is much shorter it seems worth
> considering.
>
> ...but to continue the above, the *really* important thing here (and
> correct me if I'm wrong) is that we really need to *first* prepare the
> alternates, and *then* do the rest, as our new alternates might point to
> new loose objects and packs.

Yes, and as Derrick explained in another message, we only have to
worry about new ones getting added, not existing ones going away.

> So with both of the above (the same could be done with your new helper)
> something like this would IMO make that much clearer:
>
> 	diff --git a/packfile.c b/packfile.c
> 	index 79e21ab18e7..50cb46ca4b7 100644
> 	--- a/packfile.c
> 	+++ b/packfile.c
> 	@@ -1008,6 +1008,13 @@ void reprepare_packed_git(struct repository *r)
> 	 	struct object_directory *odb;
> 	 
> 	 	obj_read_lock();
> 	+	/*
> 	+	 * New alternates might point to new loose & pack dirs, so we
> 	+	 * must first read those.
> 	+	 */
> 	+	r->objects->loaded_alternates = 0;
> 	+	prepare_alt_odb(r);
> 	+
> 	 	for (odb = r->objects->odb; odb; odb = odb->next)
> 	 		odb_clear_loose_cache(odb);
>
> And, I think this is an exsiting edge case, but we're only locking the
> ODB of the "parent" repository in this case, so if we have alternates in
> play aren't we going to racily compute the rest here, the loose objects
> and packs of the alternates we're about to consult aren't under the same
> lock?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-07 17:29   ` Junio C Hamano
@ 2023-03-07 18:18     ` Junio C Hamano
  2023-03-08 13:29     ` Derrick Stolee
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2023-03-07 18:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> redo its thing" must not be done only in reprepare_packed_git(), and

Oops, "must be done only", of course.  Sorry.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-07 17:29   ` Junio C Hamano
  2023-03-07 18:18     ` Junio C Hamano
@ 2023-03-08 13:29     ` Derrick Stolee
  1 sibling, 0 replies; 19+ messages in thread
From: Derrick Stolee @ 2023-03-08 13:29 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me

On 3/7/2023 12:29 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
>> But in fact we've been doing the locking since 6c307626f1e (grep:
>> protect packed_git [re-]initialization, 2020-01-15). So the only thing
>> that really needs justification here is that putting the alternates
>> re-reading under the same lock
>>
>> There is a really interesting potential caveat here which you're not
>> discussing, which is...
>>> ...
>>> +void reprepare_alt_odb(struct repository *r)
>>> +{
>>> +	r->objects->loaded_alternates = 0;
>>> +	prepare_alt_odb(r);
>>> +}
>>> +
>>> ...
>> This seems reasonable, but wouldn't this do the same without introducing
>> an API function just for this one use-case?
>>
>> That's of course a nit, and you seem to have been adding this for
>> consistency with reprepare_packed_git(), but it already "owns" the
>> "approximate_object_count_valid" and "packed_git_initialized" flags in
>> "struct raw_object_store".
>>
>> So as we'll only need this from reprepare_packed_git() isn't it better
>> to declare that "loaded_alternates" is another such flag?
> 
> I am not sure I got what you want to say 100%, but if you are saying
> that this "drop the 'loaded' flag and force prepare_*() function to
> redo its thing" must not be done only in reprepare_packed_git(), and
> that inlining the code there without introducing a helper function
> that anybody can casually call without thinking its consequenced
> through, then I tend to agree in principle.  But it is just as easy
> to lift two lines of code from the rewritten/inlined code to a new
> place---to ensure people follow the obj_read_lock() rule, the
> comment before it may have to be a bit stronger, I wonder?

The fact that we do it in a lock in reprepare_packed_git() in the
only current caller does raise suspicion that someone could call it
later and not realize that the lock could be helpful. Inlining the
change within reprepare_packed_git() makes the most sense here
instead of mimicking the pattern.
 
>> Perhaps not, but as the resulting patch is much shorter it seems worth
>> considering.

The shortness of the patch is metric of quality, to me. The other
reason (we might introduce a footgun) is more interesting.

>> ...but to continue the above, the *really* important thing here (and
>> correct me if I'm wrong) is that we really need to *first* prepare the
>> alternates, and *then* do the rest, as our new alternates might point to
>> new loose objects and packs.
> 
> Yes, and as Derrick explained in another message, we only have to
> worry about new ones getting added, not existing ones going away.

I'll be sure to clarify that in my next version.

>> So with both of the above (the same could be done with your new helper)
>> something like this would IMO make that much clearer:
>>
>> 	diff --git a/packfile.c b/packfile.c
>> 	index 79e21ab18e7..50cb46ca4b7 100644
>> 	--- a/packfile.c
>> 	+++ b/packfile.c
>> 	@@ -1008,6 +1008,13 @@ void reprepare_packed_git(struct repository *r)
>> 	 	struct object_directory *odb;
>> 	 
>> 	 	obj_read_lock();
>> 	+	/*
>> 	+	 * New alternates might point to new loose & pack dirs, so we
>> 	+	 * must first read those.
>> 	+	 */
>> 	+	r->objects->loaded_alternates = 0;
>> 	+	prepare_alt_odb(r);
>> 	+
>> 	 	for (odb = r->objects->odb; odb; odb = odb->next)
>> 	 		odb_clear_loose_cache(odb);
>>
>> And, I think this is an exsiting edge case, but we're only locking the
>> ODB of the "parent" repository in this case, so if we have alternates in
>> play aren't we going to racily compute the rest here, the loose objects
>> and packs of the alternates we're about to consult aren't under the same
>> lock?

I don't understand what you are saying here. odb_read_lock() does not
specify a repository and is instead a global lock on reading from any
object database.

Here is its implementation:

extern int obj_read_use_lock;
extern pthread_mutex_t obj_read_mutex;

static inline void obj_read_lock(void)
{
	if(obj_read_use_lock)
		pthread_mutex_lock(&obj_read_mutex);
}

static inline void obj_read_unlock(void)
{
	if(obj_read_use_lock)
		pthread_mutex_unlock(&obj_read_mutex);
}

So I don't believe that your proposed edge case exists.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-07 14:52     ` Derrick Stolee
  2023-03-07 17:16       ` Junio C Hamano
@ 2023-03-08 15:55       ` Taylor Blau
  2023-03-08 17:13         ` Derrick Stolee
  1 sibling, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2023-03-08 15:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git

On Tue, Mar 07, 2023 at 09:52:19AM -0500, Derrick Stolee wrote:
> On 3/6/2023 7:28 PM, Taylor Blau wrote:
> > On Mon, Mar 06, 2023 at 02:54:00PM -0800, Junio C Hamano wrote:
> >> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >>
> >>> @@ -1008,6 +1008,7 @@ void reprepare_packed_git(struct repository *r)
> >>>  	struct object_directory *odb;
> >>>
> >>>  	obj_read_lock();
> >>> +	reprepare_alt_odb(r);
> >>>  	for (odb = r->objects->odb; odb; odb = odb->next)
> >>>  		odb_clear_loose_cache(odb);
> >>
> >> Hmph, if there was an old alternate ODB from which we took some
> >> loose object from and cached, and if that ODB no longer is on the
> >> updated alternate list, would we now fail to clear the loose objects
> >> cache for the ODB?  Or are we only prepared for seeing "more"
> >> alternates and assume no existing alternates go away?
> >
> > Based on my understanding of the patch, we are only prepared to see
> > "more" alternates, rather than some existing alternate going away.
> >
> > That being said, I am not certain that is how it works. Perhaps an
> > alternate "goes away", but does not actually get removed from the list
> > of alternate ODBs. If that's the case, any object lookup in that
> > now-missing ODB would fail, but any subsequent ODBs which were added
> > after calling reprepare_alt_odb() would succeed on that object lookup.
> >
> > So, I don't know. I don't have the implementation details of the
> > alternates ODB mechanism paged in enough to say for sure. Hopefully
> > Stolee can point us in the right direction.
>
> The prepare_alt_odb() call only _adds_ to the linked odb list. It
> will not remove any existing ODBs. Adding this reprepare_*() method
> makes it such that we can use the union of the alternates available
> across the lifetime of the process.

Right, that matches my understanding. What I am asking is: since we only
add ODBs to the list, what happens if we can no longer access an
*existing* alternate at the time we call reprepare_alt_odb()?

It's clear that that now-inaccessible alternate remains in our list of
alternate ODBs, but do all object lookups hitting that ODB fail-over to
the new ODB? I believe so, but it isn't totally clear to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] object-file: reprepare alternates when necessary
  2023-03-08 15:55       ` Taylor Blau
@ 2023-03-08 17:13         ` Derrick Stolee
  0 siblings, 0 replies; 19+ messages in thread
From: Derrick Stolee @ 2023-03-08 17:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git

On 3/8/2023 10:55 AM, Taylor Blau wrote:
> On Tue, Mar 07, 2023 at 09:52:19AM -0500, Derrick Stolee wrote:

>> The prepare_alt_odb() call only _adds_ to the linked odb list. It
>> will not remove any existing ODBs. Adding this reprepare_*() method
>> makes it such that we can use the union of the alternates available
>> across the lifetime of the process.
> 
> Right, that matches my understanding. What I am asking is: since we only
> add ODBs to the list, what happens if we can no longer access an
> *existing* alternate at the time we call reprepare_alt_odb()?
> 
> It's clear that that now-inaccessible alternate remains in our list of
> alternate ODBs, but do all object lookups hitting that ODB fail-over to
> the new ODB? I believe so, but it isn't totally clear to me.

It's the same as the pack-file list: if we fail to load something
from one, then we continue to the next one. If an alternate dir
is completely removed during the process, then looking for pack-
files again will fail to see any and continue without error.

This is already possible by deleting an alternate directory
while a Git process is running and might try to open files in it.
Git already recovers from this scenario.

If you're instead talking about the .git/objects/info/alternates
file being modified to remove an alternate from the list, then
Git's current behavior is to keep that alternate around for the
life of the process, and I recommend continuing that behavior.

There's nothing special that we are adding here that doesn't
already exist as protections when files are removed beneath the
Git process.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-06 20:59 [PATCH] object-file: reprepare alternates when necessary Derrick Stolee via GitGitGadget
  2023-03-06 22:54 ` Junio C Hamano
  2023-03-07 11:28 ` Ævar Arnfjörð Bjarmason
@ 2023-03-08 18:47 ` Derrick Stolee via GitGitGadget
  2023-03-08 19:35   ` Junio C Hamano
                     ` (2 more replies)
  2 siblings, 3 replies; 19+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-08 18:47 UTC (permalink / raw)
  To: git
  Cc: gitster, me, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When an object is not found in a repository's object store, we sometimes
call reprepare_packed_git() to see if the object was temporarily moved
into a new pack-file (and its old pack-file or loose object was
deleted). This process does a scan of each pack directory within each
odb, but does not reevaluate if the odb list needs updating.

Extend reprepare_packed_git() to also reprepate the alternate odb list
by setting loaded_alternates to zero and calling prepare_alt_odb(). This
will add newly-discoverd odbs to the linked list, but will not duplicate
existing ones nor will it remove existing ones that are no longer listed
in the alternates file. Do this under the object read lock to avoid
readers from interacting with a potentially incomplete odb being added
to the odb list.

If the alternates file was edited to _remove_ some alternates during the
course of the Git process, Git will continue to see alternates that were
ever valid for that repository. ODBs are not removed from the list, the
same as the existing behavior before this change. Git already has
protections against an alternate directory disappearing from the
filesystem during the lifetime of a process, and those are still in
effect.

This change is specifically for concurrent changes to the repository, so
it is difficult to create a test that guarantees this behavior is
correct. I manually verified by introducing a reprepare_packed_git() call
into get_revision() and stepped into that call in a debugger with a
parent 'git log' process. Multiple runs of prepare_alt_odb() kept
the_repository->objects->odb as a single-item chain until I added a
.git/objects/info/alternates file in a different process. The next run
added the new odb to the chain and subsequent runs did not add to the
chain.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    object-file: reprepare alternates when necessary
    
    This subtlety was notice by Michael Haggerty due to how alternates are
    used server-side at $DAYJOB. Moving pack-files from a repository to the
    alternate occasionally causes failures because processes that start
    before the alternate exists don't know how to find that alternate at
    run-time.
    
    
    Update in v2
    ============
    
     * Instead of creating a new public reprepare_alt_odb() method, inline
       its implementation inside reprepare_packed_git().
     * Update commit message to be explicit about behavior with alternates
       being removed from the alternates file or from disk.
    
    Thanks,
    
     * Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1490%2Fderrickstolee%2Fstolee%2Freprepare-alternates-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1490/derrickstolee/stolee/reprepare-alternates-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1490

Range-diff vs v1:

 1:  3f42c9cdef7 ! 1:  9a5e1d9a9da object-file: reprepare alternates when necessary
     @@ Commit message
          deleted). This process does a scan of each pack directory within each
          odb, but does not reevaluate if the odb list needs updating.
      
     -    Create a new reprepare_alt_odb() method that is a similar wrapper around
     -    prepare_alt_odb(). Call it from reprepare_packed_git() under the object
     -    read lock to avoid readers from interacting with a potentially
     -    incomplete odb being added to the odb list.
     -
     -    prepare_alt_odb() already avoids adding duplicate odbs to the list
     -    during its progress, so it is safe to call it again from
     -    reprepare_alt_odb() without worrying about duplicate odbs.
     +    Extend reprepare_packed_git() to also reprepate the alternate odb list
     +    by setting loaded_alternates to zero and calling prepare_alt_odb(). This
     +    will add newly-discoverd odbs to the linked list, but will not duplicate
     +    existing ones nor will it remove existing ones that are no longer listed
     +    in the alternates file. Do this under the object read lock to avoid
     +    readers from interacting with a potentially incomplete odb being added
     +    to the odb list.
     +
     +    If the alternates file was edited to _remove_ some alternates during the
     +    course of the Git process, Git will continue to see alternates that were
     +    ever valid for that repository. ODBs are not removed from the list, the
     +    same as the existing behavior before this change. Git already has
     +    protections against an alternate directory disappearing from the
     +    filesystem during the lifetime of a process, and those are still in
     +    effect.
      
          This change is specifically for concurrent changes to the repository, so
          it is difficult to create a test that guarantees this behavior is
          correct. I manually verified by introducing a reprepare_packed_git() call
          into get_revision() and stepped into that call in a debugger with a
     -    parent 'git log' process. Multiple runs of reprepare_alt_odb() kept
     +    parent 'git log' process. Multiple runs of prepare_alt_odb() kept
          the_repository->objects->odb as a single-item chain until I added a
          .git/objects/info/alternates file in a different process. The next run
          added the new odb to the chain and subsequent runs did not add to the
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     - ## object-file.c ##
     -@@ object-file.c: void prepare_alt_odb(struct repository *r)
     - 	r->objects->loaded_alternates = 1;
     - }
     - 
     -+void reprepare_alt_odb(struct repository *r)
     -+{
     -+	r->objects->loaded_alternates = 0;
     -+	prepare_alt_odb(r);
     -+}
     -+
     - /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
     - static int freshen_file(const char *fn)
     - {
     -
     - ## object-store.h ##
     -@@ object-store.h: KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
     - 	struct object_directory *, 1, fspathhash, fspatheq)
     - 
     - void prepare_alt_odb(struct repository *r);
     -+void reprepare_alt_odb(struct repository *r);
     - char *compute_alternate_path(const char *path, struct strbuf *err);
     - struct object_directory *find_odb(struct repository *r, const char *obj_dir);
     - typedef int alt_odb_fn(struct object_directory *, void *);
     -
       ## packfile.c ##
      @@ packfile.c: void reprepare_packed_git(struct repository *r)
       	struct object_directory *odb;
       
       	obj_read_lock();
     -+	reprepare_alt_odb(r);
     ++
     ++	/*
     ++	 * Reprepare alt odbs, in case the alternates file was modified
     ++	 * during the course of this process. This only _adds_ odbs to
     ++	 * the linked list, so existing odbs will continue to exist for
     ++	 * the lifetime of the process.
     ++	 */
     ++	r->objects->loaded_alternates = 0;
     ++	prepare_alt_odb(r);
     ++
       	for (odb = r->objects->odb; odb; odb = odb->next)
       		odb_clear_loose_cache(odb);
       


 packfile.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/packfile.c b/packfile.c
index 79e21ab18e7..06419c8e8ca 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1008,6 +1008,16 @@ void reprepare_packed_git(struct repository *r)
 	struct object_directory *odb;
 
 	obj_read_lock();
+
+	/*
+	 * Reprepare alt odbs, in case the alternates file was modified
+	 * during the course of this process. This only _adds_ odbs to
+	 * the linked list, so existing odbs will continue to exist for
+	 * the lifetime of the process.
+	 */
+	r->objects->loaded_alternates = 0;
+	prepare_alt_odb(r);
+
 	for (odb = r->objects->odb; odb; odb = odb->next)
 		odb_clear_loose_cache(odb);
 

base-commit: d15644fe0226af7ffc874572d968598564a230dd
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
@ 2023-03-08 19:35   ` Junio C Hamano
  2023-03-08 20:47     ` Taylor Blau
  2023-03-09  7:24   ` Jeff King
  2023-03-10 21:29   ` Jonathan Tan
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2023-03-08 19:35 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, Ævar Arnfjörð Bjarmason, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When an object is not found in a repository's object store, we sometimes
> call reprepare_packed_git() to see if the object was temporarily moved
> into a new pack-file (and its old pack-file or loose object was
> deleted). This process does a scan of each pack directory within each
> odb, but does not reevaluate if the odb list needs updating.

Very well explained, except I do not know what is meant by
"temporarily", which to me implies what was moved is moved back
later.  Without "temporarily", it reads perfectly well.  While I do
not think of a good word to use to tell the readers that the moving
is done by somebody else while we are trying to find the object, if
there were one, that would replace "temporarily" very well.

But other than that tiny nit that does not have to be addressed at
all, everything in this patch looks good.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-08 19:35   ` Junio C Hamano
@ 2023-03-08 20:47     ` Taylor Blau
  0 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2023-03-08 20:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

On Wed, Mar 08, 2023 at 11:35:06AM -0800, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Derrick Stolee <derrickstolee@github.com>
> >
> > When an object is not found in a repository's object store, we sometimes
> > call reprepare_packed_git() to see if the object was temporarily moved
> > into a new pack-file (and its old pack-file or loose object was
> > deleted). This process does a scan of each pack directory within each
> > odb, but does not reevaluate if the odb list needs updating.
>
> Very well explained, except I do not know what is meant by
> "temporarily", which to me implies what was moved is moved back
> later.  Without "temporarily", it reads perfectly well.  While I do
> not think of a good word to use to tell the readers that the moving
> is done by somebody else while we are trying to find the object, if
> there were one, that would replace "temporarily" very well.
>
> But other than that tiny nit that does not have to be addressed at
> all, everything in this patch looks good.

s/temporarily/concurrently?

I think that concurrently conveys that another process was doing the
moving. But I agree that it doesn't matter here to aid in the overall
understanding of what's happening in this patch.

So I'd be fine with this patch as-is, with s/temporarily/concurrently,
or with s/temporarily//.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  2023-03-08 19:35   ` Junio C Hamano
@ 2023-03-09  7:24   ` Jeff King
  2023-03-09  9:06     ` Eric Wong
  2023-03-10 21:29   ` Jonathan Tan
  2 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2023-03-09  7:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Eric Wong, git, gitster, me,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

On Wed, Mar 08, 2023 at 06:47:32PM +0000, Derrick Stolee via GitGitGadget wrote:

> Extend reprepare_packed_git() to also reprepate the alternate odb list

s/reprepate/reprepare/

Unless you were inventing a new word. :)

> This change is specifically for concurrent changes to the repository, so
> it is difficult to create a test that guarantees this behavior is
> correct. I manually verified by introducing a reprepare_packed_git() call
> into get_revision() and stepped into that call in a debugger with a
> parent 'git log' process. Multiple runs of prepare_alt_odb() kept
> the_repository->objects->odb as a single-item chain until I added a
> .git/objects/info/alternates file in a different process. The next run
> added the new odb to the chain and subsequent runs did not add to the
> chain.

One thing that test wouldn't cover is loading alternates from
$GIT_ALTERNATE_OBJECT_DIRECTORIES. Once upon a time, I think we were
pretty inconsistent in finding duplicates. But these days it looks like
it's all done centrally in link_alt_odb_entry(), via alt_odb_usable().
So it should be good.

>     object-file: reprepare alternates when necessary
>     
>     This subtlety was notice by Michael Haggerty due to how alternates are
>     used server-side at $DAYJOB. Moving pack-files from a repository to the
>     alternate occasionally causes failures because processes that start
>     before the alternate exists don't know how to find that alternate at
>     run-time.

Yeah, I don't think there's any real reason not to do this. It simply
doesn't come up in the same way that packfile-repreparing does, because
there it is routine for a concurrent gc to remove an existing packfile
and add the objects elsewhere.

Whereas modifying alternates was never seen as something that would
happen often or automatically. I guess in GitHub's case it is from
converting a repository from stand-alone into a fork, migrating its
objects to a shared one, and adding an alternate.

The only downside might be performance. For sane cases, I think scanning
the new alternates is OK. I know Eric (cc'd) has some crazy
100k-alternate setup (from 407532f82d, etc), but I'd expect a reprepare
there is already expensive (we already have to re-scan every one of
those directories for packfiles, and throw out any loose object caches).

The patch itself looks good to me (and I agree with the sentiment to
just inline the lines rather than adding a function).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-09  7:24   ` Jeff King
@ 2023-03-09  9:06     ` Eric Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Wong @ 2023-03-09  9:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, gitster, me,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Konstantin Ryabitsev

Jeff King <peff@peff.net> wrote:
> The only downside might be performance. For sane cases, I think scanning
> the new alternates is OK. I know Eric (cc'd) has some crazy
> 100k-alternate setup (from 407532f82d, etc), but I'd expect a reprepare
> there is already expensive (we already have to re-scan every one of
> those directories for packfiles, and throw out any loose object caches).

I'm not sure if that 100k alternate thing is happening, yet...
(initial specs called for ~30k, but I figured it might grow)

If it does, I'm thinking about enhancing --batch-command, to support
`add-alternate' to dynamically add alternates while running cat-file.

Right now, my biggest use case is only 250 alternates or so.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  2023-03-08 19:35   ` Junio C Hamano
  2023-03-09  7:24   ` Jeff King
@ 2023-03-10 21:29   ` Jonathan Tan
  2023-03-11  0:01     ` Junio C Hamano
  2 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2023-03-10 21:29 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>     Update in v2
>     ============
>     
>      * Instead of creating a new public reprepare_alt_odb() method, inline
>        its implementation inside reprepare_packed_git().

[snip]

> diff --git a/packfile.c b/packfile.c
> index 79e21ab18e7..06419c8e8ca 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1008,6 +1008,16 @@ void reprepare_packed_git(struct repository *r)
>  	struct object_directory *odb;
>  
>  	obj_read_lock();
> +
> +	/*
> +	 * Reprepare alt odbs, in case the alternates file was modified
> +	 * during the course of this process. This only _adds_ odbs to
> +	 * the linked list, so existing odbs will continue to exist for
> +	 * the lifetime of the process.
> +	 */
> +	r->objects->loaded_alternates = 0;
> +	prepare_alt_odb(r);

I understand that the change to inline what was originally
reprepare_alt_odb() was in response to a reviewer comment, but I would
prefer the original version, since this assumes that prepare_alt_odb()
only adds to what's there instead of first clearing the odb linked list
and odb_by_path hashmap.

But I guess clearing a linked list and hashmap can be a bit cumbersome
in C, so maybe it would be reasonable to assume that this behavior
would not change. In any case, maybe a comment should be added
to prepare_alt_odb() saying that if an update of the alternates
is needed, one can do so by clearing loaded_alternates and re-
invoking prepare_alt_odb() (at least so that a developer changing
prepare_alt_odb() can see the comment and understand what behaviors this
function needs to preserve).

Everything else (the new functionality, the commit message etc.) looks
good, of course.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-10 21:29   ` Jonathan Tan
@ 2023-03-11  0:01     ` Junio C Hamano
  2023-03-11  3:09       ` Jonathan Tan
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2023-03-11  0:01 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Derrick Stolee via GitGitGadget, git, me,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

Jonathan Tan <jonathantanmy@google.com> writes:

> But I guess clearing a linked list and hashmap can be a bit cumbersome
> in C, so maybe it would be reasonable to assume that this behavior
> would not change.

I think the original reason why we did not clear was because we
never knew (and we do not know) what is still in use.  Can anybody
explain why it would be a safe thing to do to rebuild the alt-odb
list from scratch?  Surely we can have a big central lock to do the
"list manipulation" part safely, but is it safe to access the
objects we obtained from one of the odb's that no longer is listed
on the alt-odb list, for example?  The code that this patch touches
clears the loose object cache after updating the odb list, so loose
object cache of any odb that goes away will not be cleared here,
which means that the code that reconstruts alt-odb list would need
to clear the loose object cache automatically?  What should we do to
packfiles that are mmaped from these alt-odbs that goes away?  Etc.
etc.

> In any case, maybe a comment should be added to prepare_alt_odb()
> saying that if an update of the alternates is needed, one can do
> so by clearing loaded_alternates and re- invoking
> prepare_alt_odb() (at least so that a developer changing
> prepare_alt_odb() can see the comment and understand what
> behaviors this function needs to preserve).

Sounds good.

Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] object-file: reprepare alternates when necessary
  2023-03-11  0:01     ` Junio C Hamano
@ 2023-03-11  3:09       ` Jonathan Tan
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Tan @ 2023-03-11  3:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Tan, Derrick Stolee via GitGitGadget, git, me,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > But I guess clearing a linked list and hashmap can be a bit cumbersome
> > in C, so maybe it would be reasonable to assume that this behavior
> > would not change.
> 
> I think the original reason why we did not clear was because we
> never knew (and we do not know) what is still in use.  

Ah, I meant that prepare_alt_odb() assumes that the data structures
are as they were originally initialized. So I can think of at least
2 ways that we could implement prepare_alt_odb():

 (a) Repeatedly modify r->objects->odb_tail and repeatedly assign to
     kh_value(r->objects->odb_by_path, pos).
 (b) Call make_odb(), which returns a pointer that we assign to
     r->objects->odb, and call make_odb_hashmap(), which returns a
     pointer that we assign to r->objects->odb_by_path.

Remember that prepare_alt_odb() is meant to start from scratch, so it is
reasonable to do either.

In Java, for example, it would be very reasonable to do (b). But
this patch assumes that we have implemented it as (a). Indeed we
have implemented it as (a) (although we have never documented it as
such), and my quoted statement above is meant to say that it might be
reasonable to keep assuming that it would be (a).

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-03-11  3:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 20:59 [PATCH] object-file: reprepare alternates when necessary Derrick Stolee via GitGitGadget
2023-03-06 22:54 ` Junio C Hamano
2023-03-07  0:28   ` Taylor Blau
2023-03-07 14:52     ` Derrick Stolee
2023-03-07 17:16       ` Junio C Hamano
2023-03-08 15:55       ` Taylor Blau
2023-03-08 17:13         ` Derrick Stolee
2023-03-07 11:28 ` Ævar Arnfjörð Bjarmason
2023-03-07 17:29   ` Junio C Hamano
2023-03-07 18:18     ` Junio C Hamano
2023-03-08 13:29     ` Derrick Stolee
2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
2023-03-08 19:35   ` Junio C Hamano
2023-03-08 20:47     ` Taylor Blau
2023-03-09  7:24   ` Jeff King
2023-03-09  9:06     ` Eric Wong
2023-03-10 21:29   ` Jonathan Tan
2023-03-11  0:01     ` Junio C Hamano
2023-03-11  3:09       ` Jonathan Tan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.