git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Chinmoy via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Chinmoy <chinmoy12c@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3] cache-tree.c: remove implicit dependency on the_repository
Date: Mon, 5 Apr 2021 09:08:44 -0400	[thread overview]
Message-ID: <be9793f0-f437-8e42-d463-21a48d2ee948@gmail.com> (raw)
In-Reply-To: <xmqqy2dyy40l.fsf@gitster.g>

On 4/4/2021 2:09 AM, Junio C Hamano wrote:
> "Chinmoy via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/sparse-index.c b/sparse-index.c
>> index 95ea17174da3..e4323ffd81db 100644
>> --- a/sparse-index.c
>> +++ b/sparse-index.c
>> @@ -128,12 +128,14 @@ int set_sparse_index_config(struct repository *repo, int enable)
>>  int convert_to_sparse(struct index_state *istate)
>>  {
>>  	int test_env;
>> +	struct repository *r = the_repository;
>> +
>>  	if (istate->split_index || istate->sparse_index ||
>>  	    !core_apply_sparse_checkout || !core_sparse_checkout_cone)
>>  		return 0;
>>  
>>  	if (!istate->repo)
>> -		istate->repo = the_repository;
>> +		istate->repo = r;
>>  
> 
> I am not quite sure if this is a reasonable conversion.  Surely it
> would not make the compiler barf, but are we sure that the caller of
> convert_to_sparse() wants us to work on the_repository instance and
> no other repository?  As an istate has a .repo member, it seems to
> me that using istate->repo would be a lot saner approach than
> assigning the_repository upfront to r.  It might be even needed, if
> cache_tree_update() wants to take a repository instance, to ensure
> that all callers to this helper sets istate->repo before they call
> it so that the above "if not set yet, use the_repository" code does
> not have to kick in.
> 
>>  	/*
>>  	 * The GIT_TEST_SPARSE_INDEX environment variable triggers the
>> @@ -161,7 +163,7 @@ int convert_to_sparse(struct index_state *istate)
>>  		return -1;
>>  	}
>>  
>> -	if (cache_tree_update(istate, 0)) {
>> +	if (cache_tree_update(r, istate, 0)) {
> 
> And this looks like a bad conversion.  It may happen to do the same
> thing, but the flow of the logic up to this point in the function
> was to make sure istate->repo is not empty by filling it if it is
> not yet set, and update the cache tree of that istate.  So, it seems
> more logical if this call were like so, no?
> 
> 	if (cache_tree_update(istate->repo, istate, 0)) {
> 
> In fact, in the world after 1fd9ae51 (repository: add repo reference
> to index_state, 2021-01-23), it is dubious that this topic to teach
> cache_tree_update() to take a repository pointer is sensible.  While
> working on a single repository, we may create multiple in-core index
> instances that represent temporary indices, but each of these in-core
> index instances (i.e. istate) belong to a single repository.
> 
> And in a call to cache_tree_update(R, I, F), if I->repo is *NOT* R,
> that must mean a bug.  Here is what 1fd9ae51 says on this point.
> 
>     repository: add repo reference to index_state
> 
>     It will be helpful to add behavior to index operations that might
>     trigger an object lookup. Since each index belongs to a specific
>     repository, add a 'repo' pointer to struct index_state that allows
>     access to this repository.
> 
>     Add a BUG() statement if the repo already has an index, and the index
>     already has a repo, but somehow the index points to a different repo.
> 
>     This will prevent future changes from needing to pass an additional
>     'struct repository *repo' parameter and instead rely only on the 'struct
>     index_state *istate' parameter.
> 
> Derrick, what's you thought on this?  The patch under discussion
> looks to me a prime example of "future change(s)" needing "to pass
> an additional 'struct repository *repo' parameter".

With your additional comments, I think it is clear that the "fourth
option" I mentioned earlier [1] is the way to go:

  Finally, there is yet a fourth option: use istate->repo instead. In
  1fd9ae51 (repository: add repo reference to index_state), I added a
  'repo' member to struct index_state. This is intended for methods to
  access a repository directly from the index.

[1] https://lore.kernel.org/git/f187df01-8e59-ac74-01e1-586a7a63fd4e@gmail.com/

So in this sense, we should always use istate->repo, but we might
still need the following guard in some places:

	if (!istate->repo)
		istate->repo = the_repository;

in case there are situations where the index is loaded before
the_repository is loaded. I have hit this in testing, but don't fully
understand the cases where this can happen.

The way it would change this patch is to drop the 'struct repository *r'
pointers and changes to the method signatures. Instead, keep the
methods only taking a 'struct index_state *istate' and use istate->repo
everywhere.

Thanks,
-Stolee

  reply	other threads:[~2021-04-05 13:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25 19:48 [PATCH] cache-tree.c: remove implicit dependency on the_repository Chinmoy via GitGitGadget
2021-03-25 20:31 ` Derrick Stolee
2021-03-26  6:49   ` Chinmoy Chakraborty
2021-03-26  7:54   ` Chinmoy Chakraborty
2021-03-26 15:35 ` [PATCH v2] " Chinmoy via GitGitGadget
2021-04-03 15:57   ` [PATCH v3] " Chinmoy via GitGitGadget
2021-04-04  1:49     ` Junio C Hamano
2021-04-04  5:11       ` Chinmoy Chakraborty
2021-04-04  5:36         ` Junio C Hamano
2021-04-04  5:18       ` Chinmoy Chakraborty
2021-04-04  6:09     ` Junio C Hamano
2021-04-05 13:08       ` Derrick Stolee [this message]
2021-04-05 17:48         ` Junio C Hamano
2021-04-07  6:54     ` [PATCH v4] " Chinmoy via GitGitGadget
2021-04-07 23:03       ` Junio C Hamano
2021-04-08  3:56         ` Chinmoy Chakraborty
2021-04-08 13:23           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be9793f0-f437-8e42-d463-21a48d2ee948@gmail.com \
    --to=stolee@gmail.com \
    --cc=chinmoy12c@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).