git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] partial-clone: add a partial-clone test case
@ 2022-03-13 17:39 Abhradeep Chakraborty via GitGitGadget
  2022-03-13 19:41 ` Junio C Hamano
  2022-03-16  9:46 ` [PATCH v2] " Abhradeep Chakraborty via GitGitGadget
  0 siblings, 2 replies; 16+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-03-13 17:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

In a blobless-cloned repo, `git log --follow -- <path>` (`<path>` have
an exact OID rename) doesn't download blob of the file from where the
new file is renamed.

Add a test case to verify it.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
    partial-clone: add a partial-clone test case
    
    Fixes #827 [1]
    
    [1] https://github.com/gitgitgadget/git/issues/827

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1175%2FAbhra303%2Fcheck_partial_clone-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1175/Abhra303/check_partial_clone-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1175

 t/t0410-partial-clone.sh | 16 ++++++++++++++++
 t/test-lib-functions.sh  |  2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index f17abd298c8..1e54f4844fd 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -618,6 +618,22 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
 	git -C repo cherry-pick side1
 '
 
+test_expect_success 'git log --follow does not download blobs if an exact OID rename found (blobless clone)' '
+	rm -rf repo partial.git &&
+	test_create_repo repo &&
+	content="some dummy content" &&
+	test_commit -C repo create-a-file file.txt "$content" &&
+	git -C repo mv file.txt new-file.txt &&
+	git -C repo commit -m rename-the-file &&
+	test_config -C repo uploadpack.allowfilter 1 &&
+	test_config -C repo uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:none "file://$(pwd)/repo" partial.git &&
+	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
+		git -C partial.git log --follow -- new-file.txt > "$(pwd)/trace.txt" &&
+	! test_subcommand_inexact fetch <trace.txt
+'
+
 test_expect_success 'lazy-fetch when accessing object not in the_repository' '
 	rm -rf full partial.git &&
 	test_create_repo full &&
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 0f439c99d61..07a2b60c103 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1811,7 +1811,7 @@ test_subcommand_inexact () {
 		shift
 	fi
 
-	local expr=$(printf '"%s".*' "$@")
+	local expr=$(printf '.*"%s".*' "$@")
 	expr="${expr%,}"
 
 	if test -n "$negate"

base-commit: 1a4874565fa3b6668042216189551b98b4dc0b1b
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-13 17:39 [PATCH] partial-clone: add a partial-clone test case Abhradeep Chakraborty via GitGitGadget
@ 2022-03-13 19:41 ` Junio C Hamano
  2022-03-14 15:46   ` Abhradeep Chakraborty
  2022-03-14 16:24   ` Derrick Stolee
  2022-03-16  9:46 ` [PATCH v2] " Abhradeep Chakraborty via GitGitGadget
  1 sibling, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-03-13 19:41 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Derrick Stolee, Johannes Schindelin, Abhradeep Chakraborty

"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> In a blobless-cloned repo, `git log --follow -- <path>` (`<path>` have
> an exact OID rename) doesn't download blob of the file from where the
> new file is renamed.

Is this "doesn't" (documenting current behaviour, without saying if
it is wrong or is desired) or "shouldn't" (documenting the desired
behaviour, which the current implementation may or may not satisfy)?

> +test_expect_success 'git log --follow does not download blobs if an exact OID rename found (blobless clone)' '

That's mouthful.

> +	rm -rf repo partial.git &&
> +	test_create_repo repo &&
> +	content="some dummy content" &&
> +	test_commit -C repo create-a-file file.txt "$content" &&
> +	git -C repo mv file.txt new-file.txt &&
> +	git -C repo commit -m rename-the-file &&
> +	test_config -C repo uploadpack.allowfilter 1 &&
> +	test_config -C repo uploadpack.allowanysha1inwant 1 &&
> +
> +	git clone --filter=blob:none "file://$(pwd)/repo" partial.git &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
> +		git -C partial.git log --follow -- new-file.txt > "$(pwd)/trace.txt" &&

Lose SP after '>'.

		git -C partial.git log --follow -- new-file.txt >"$(pwd)/trace.txt" &&

> +	! test_subcommand_inexact fetch <trace.txt

Looking at the implementation of the helper, it seems to be prepared
to handle negation itself.  Shouldn't this be

	test_subcommand_inexact ! fetch <trace.txt

instead?

> +'
> +
>  test_expect_success 'lazy-fetch when accessing object not in the_repository' '
>  	rm -rf full partial.git &&
>  	test_create_repo full &&
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 0f439c99d61..07a2b60c103 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1811,7 +1811,7 @@ test_subcommand_inexact () {
>  		shift
>  	fi
>  
> -	local expr=$(printf '"%s".*' "$@")
> +	local expr=$(printf '.*"%s".*' "$@")

The original wanted to make sure that the arguments to the helper
are initial items of a comma separated list, and an existing caller,
for example, i.e.

  test_subcommand_inexact git pack-objects --honor-pack-keep <trace

is relying on the behaviour to make sure 'git', 'pack-objects', ...
appear at the beginning of "[...]" enclosed list.  This change
breaks its ability to notice that an insertion of unrelated token
before 'git' as an error.

In other words, it looks like an uncalled-for selfish change.

Why can't you specify what should NOT come before "fetch" in your
use of this helper?

>  	expr="${expr%,}"

The preimage already has this problem, but the stripping of trailing
comma here is a result of mistaken copy-and-paste from the exact
variant, I think.  test_subcommand uses

	local expr=$(printf '"%s",' "$@")

to concatenate "$@" into a single comma-separated string, so it
perfectly makes sense to drop the last one here, but with or without
your change here, neither is adding a comma that need to be
stripped.


It is not _your_ theme, but I think this helper is poorly designed,
especially compared to the original "exact" variant.

        test_subcommand_inexact () {
                local negate=
                if test "$1" = "!"
                then
                        negate=t
                        shift
                fi

                local expr=$(printf '"%s".*' "$@")
                expr="${expr%,}"

                if test -n "$negate"
                then
                        ! grep "\"event\":\"child_start\".*\[$expr\]"
                else
                        grep "\"event\":\"child_start\".*\[$expr\]"
                fi
        }


I've already touched that "${expr%,}" there is a totally useful
statement that will always be a no-op.

When "test_subcommand_inexact git pack-objects" is run, the printf
assigns to $expr:

		expr='"git".*"pack-objects".*'

and the actual grep command invoked becomes

	grep '"event":"child_start".*\["git".*"pack-objects".*\]'

I am not sure if that is what we really want.

I wonder if it was more like this that the original wanted to grep for:

	grep '"event":"child_start".*\["git","pack-objects",.*\]'

in which case the two lines there should be more like

	local expr=$(printf '"%s",' "$@")
	expr="${expr%,}.*"

I would think.  This comes from Derrick's e4d0c11c (repack: respect
kept objects with '--write-midx -b', 2021-12-20).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-13 19:41 ` Junio C Hamano
@ 2022-03-14 15:46   ` Abhradeep Chakraborty
  2022-03-14 16:25     ` Derrick Stolee
  2022-03-14 21:35     ` Junio C Hamano
  2022-03-14 16:24   ` Derrick Stolee
  1 sibling, 2 replies; 16+ messages in thread
From: Abhradeep Chakraborty @ 2022-03-14 15:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Abhradeep Chakraborty, git, Derrick Stolee, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> wrote:

> Is this "doesn't" (documenting current behaviour, without saying if
> it is wrong or is desired) or "shouldn't" (documenting the desired
> behaviour, which the current implementation may or may not satisfy)?

The current behaviour is okay and this commit adds the test case for it.
So, in that sense, I think "shouldn't" is better word.

> That's mouthful.

Sorry if the test name is very long. But, I couldn't think of shorter
test name than this - to explain what the test case is.

> Lose SP after '>'.
>
> 		git -C partial.git log --follow -- new-file.txt >"$(pwd)/trace.txt" &&

Okay.

> Looking at the implementation of the helper, it seems to be prepared
> to handle negation itself.  Shouldn't this be
>
>	test_subcommand_inexact ! fetch <trace.txt
>
> instead?

Oops, completely missed it. Correcting it :)

> Why can't you specify what should NOT come before "fetch" in your
> use of this helper?

Below is the event triggered for non-exact OID rename -

	git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin

Derrick told me to not depend on other flags like
`-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted
and as it makes sense to me also. That's why I didn't specify those things.

> I wonder if it was more like this that the original wanted to grep for:
>
>	grep '"event":"child_start".*\["git","pack-objects",.*\]'

I don't know about other cases, but in my case, atleast I really wanted
it.

So, In this scenerio, should I stick with `test_subcommand_inexact` or I
have to see other helper functions (or make my own) for it?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-13 19:41 ` Junio C Hamano
  2022-03-14 15:46   ` Abhradeep Chakraborty
@ 2022-03-14 16:24   ` Derrick Stolee
  2022-03-14 22:21     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Derrick Stolee @ 2022-03-14 16:24 UTC (permalink / raw)
  To: Junio C Hamano, Abhradeep Chakraborty via GitGitGadget
  Cc: git, Johannes Schindelin, Abhradeep Chakraborty

On 3/13/2022 3:41 PM, Junio C Hamano wrote:
> "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
> writes:

>> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
>> index 0f439c99d61..07a2b60c103 100644
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -1811,7 +1811,7 @@ test_subcommand_inexact () {
>>  		shift
>>  	fi
>>  
>> -	local expr=$(printf '"%s".*' "$@")
>> +	local expr=$(printf '.*"%s".*' "$@")
> 
> The original wanted to make sure that the arguments to the helper
> are initial items of a comma separated list, and an existing caller,
> for example, i.e.
> 
>   test_subcommand_inexact git pack-objects --honor-pack-keep <trace
> 
> is relying on the behaviour to make sure 'git', 'pack-objects', ...
> appear at the beginning of "[...]" enclosed list.  This change
> breaks its ability to notice that an insertion of unrelated token
> before 'git' as an error.
> 
> In other words, it looks like an uncalled-for selfish change.

This change was my recommendation, but I see that it is probably
not ideal.
 
> Why can't you specify what should NOT come before "fetch" in your
> use of this helper?
> 
>>  	expr="${expr%,}"
> 
> The preimage already has this problem, but the stripping of trailing
> comma here is a result of mistaken copy-and-paste from the exact
> variant, I think.  test_subcommand uses
> 
> 	local expr=$(printf '"%s",' "$@")
> 
> to concatenate "$@" into a single comma-separated string, so it
> perfectly makes sense to drop the last one here, but with or without
> your change here, neither is adding a comma that need to be
> stripped.
> 
> 
> It is not _your_ theme, but I think this helper is poorly designed,
> especially compared to the original "exact" variant.
> 
>         test_subcommand_inexact () {
>                 local negate=
>                 if test "$1" = "!"
>                 then
>                         negate=t
>                         shift
>                 fi
> 
>                 local expr=$(printf '"%s".*' "$@")
>                 expr="${expr%,}"
> 
>                 if test -n "$negate"
>                 then
>                         ! grep "\"event\":\"child_start\".*\[$expr\]"
>                 else
>                         grep "\"event\":\"child_start\".*\[$expr\]"
>                 fi
>         }
> 
> 
> I've already touched that "${expr%,}" there is a totally useful
> statement that will always be a no-op.
> 
> When "test_subcommand_inexact git pack-objects" is run, the printf
> assigns to $expr:
> 
> 		expr='"git".*"pack-objects".*'
> 
> and the actual grep command invoked becomes
> 
> 	grep '"event":"child_start".*\["git".*"pack-objects".*\]'
> 
> I am not sure if that is what we really want.

Ah, yes this certainly seems to not be the expected plan. It does
allow for more flexibility than intended: the intention was to
add flexibility at the end of the command, but instead adds
flexibility throughout, only caring that a certain list of options
is present as a subsequence (except that the first item is the
first item, namely "git" in most cases).

That unintended flexibility would allow the current needs to use

	test_subcommand_inexact ! git fetch

as desired, but there is the additional worries about whether it
is too flexible for the existing uses.

> I wonder if it was more like this that the original wanted to grep for:
> 
> 	grep '"event":"child_start".*\["git","pack-objects",.*\]'
> 
> in which case the two lines there should be more like
> 
> 	local expr=$(printf '"%s",' "$@")
> 	expr="${expr%,}.*"
> 
> I would think.  This comes from Derrick's e4d0c11c (repack: respect
> kept objects with '--write-midx -b', 2021-12-20).

Yep, that was definitely the intention, but I wrote it wrong.

I'm torn between "let's fix it to work as intended and do something
different for this test case" and "this flexibility is unexpected,
but still gives us enough information to trust the tests."

If you think that we should fix the helper to work differently, then
I can work on a patch to do so, so Abhradeep doesn't get too
sidetracked on that.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-14 15:46   ` Abhradeep Chakraborty
@ 2022-03-14 16:25     ` Derrick Stolee
  2022-03-14 21:42       ` Junio C Hamano
  2022-03-15  8:20       ` Abhradeep Chakraborty
  2022-03-14 21:35     ` Junio C Hamano
  1 sibling, 2 replies; 16+ messages in thread
From: Derrick Stolee @ 2022-03-14 16:25 UTC (permalink / raw)
  To: Abhradeep Chakraborty, Junio C Hamano; +Cc: git, Johannes Schindelin

On 3/14/2022 11:46 AM, Abhradeep Chakraborty wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
>> Why can't you specify what should NOT come before "fetch" in your
>> use of this helper?
> 
> Below is the event triggered for non-exact OID rename -
> 
> 	git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
> 
> Derrick told me to not depend on other flags like
> `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted
> and as it makes sense to me also. That's why I didn't specify those things.

This reason is something that could be mentioned in the commit message
to motivate the change to the helper.

>> I wonder if it was more like this that the original wanted to grep for:
>>
>> 	grep '"event":"child_start".*\["git","pack-objects",.*\]'
> 
> I don't know about other cases, but in my case, atleast I really wanted
> it.
> 
> So, In this scenerio, should I stick with `test_subcommand_inexact` or I
> have to see other helper functions (or make my own) for it?

As I mentioned earlier, it seems that

	test_subcommand_inexact ! git fetch

would actually work for your needs without changing the helper. We will see
whether or not the helper needs to be updated in a way that that line would
not work anymore.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-14 15:46   ` Abhradeep Chakraborty
  2022-03-14 16:25     ` Derrick Stolee
@ 2022-03-14 21:35     ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-03-14 21:35 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: git, Derrick Stolee, Johannes Schindelin

Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>
>> Is this "doesn't" (documenting current behaviour, without saying if
>> it is wrong or is desired) or "shouldn't" (documenting the desired
>> behaviour, which the current implementation may or may not satisfy)?
>
> The current behaviour is okay and this commit adds the test case for it.
> So, in that sense, I think "shouldn't" is better word.
>
>> That's mouthful.
>
> Sorry if the test name is very long. But, I couldn't think of shorter
> test name than this - to explain what the test case is.

"exact rename does not need to fetch the blob lazily"

>> Lose SP after '>'.
>>
>> 		git -C partial.git log --follow -- new-file.txt >"$(pwd)/trace.txt" &&
>
> Okay.
>
>> Looking at the implementation of the helper, it seems to be prepared
>> to handle negation itself.  Shouldn't this be
>>
>>	test_subcommand_inexact ! fetch <trace.txt
>>
>> instead?
>
> Oops, completely missed it. Correcting it :)
>
>> Why can't you specify what should NOT come before "fetch" in your
>> use of this helper?
>
> Below is the event triggered for non-exact OID rename -
>
> 	git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
>
> Derrick told me to not depend on other flags like
> `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted
> and as it makes sense to me also. That's why I didn't specify those things.
>
>> I wonder if it was more like this that the original wanted to grep for:
>>
>>	grep '"event":"child_start".*\["git","pack-objects",.*\]'
>
> I don't know about other cases, but in my case, atleast I really wanted
> it.
>
> So, In this scenerio, should I stick with `test_subcommand_inexact` or I
> have to see other helper functions (or make my own) for it?

If you are doing just a single grep, I am not sure why grepping for
"fetch" alone without any helper is insufficient.  In any case,
butchering the "inexcat" helper to loosen it for other existing
users of the same helper does not sound like a good direction to go.

Thanks.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-14 16:25     ` Derrick Stolee
@ 2022-03-14 21:42       ` Junio C Hamano
  2022-03-15  8:20       ` Abhradeep Chakraborty
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-03-14 21:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, git, Johannes Schindelin

Derrick Stolee <derrickstolee@github.com> writes:

>> 	git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
>> 
>> Derrick told me to not depend on other flags like
>> `-c fetch.negotiationAlgorithm` etc. as they might be changed or omitted
>> and as it makes sense to me also. That's why I didn't specify those things.
>
> This reason is something that could be mentioned in the commit message
> to motivate the change to the helper.
>
>>> I wonder if it was more like this that the original wanted to grep for:
>>>
>>> 	grep '"event":"child_start".*\["git","pack-objects",.*\]'
>> 
>> I don't know about other cases, but in my case, atleast I really wanted
>> it.
>> 
>> So, In this scenerio, should I stick with `test_subcommand_inexact` or I
>> have to see other helper functions (or make my own) for it?
>
> As I mentioned earlier, it seems that
>
> 	test_subcommand_inexact ! git fetch
>
> would actually work for your needs without changing the helper. We will see
> whether or not the helper needs to be updated in a way that that line would
> not work anymore.

Ah, that is because the current implementation of the helper
sprinkles ".*" in between each and every pair of parameters, so the
resulting pattern \["git".*"fetch".*\] would not be prevented from
matching by the presence of "-c var.iable=value"?

In any case, the _inexact helper may need to be rethought, or at
least documented better what inexact-ness it wants to allow.  With
no concrete goals written down, my guess was that it wanted to
ensure only the earlier command names and options, while ignoring
the later arguments.  And that was where my "adding .* in between
looks wrong, and stripping trailing comma with ${expt%,} is nonsense"
comment came from.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-14 16:24   ` Derrick Stolee
@ 2022-03-14 22:21     ` Junio C Hamano
  2022-03-15 11:30       ` Abhradeep Chakraborty
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2022-03-14 22:21 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Johannes Schindelin,
	Abhradeep Chakraborty

Derrick Stolee <derrickstolee@github.com> writes:

>> When "test_subcommand_inexact git pack-objects" is run, the printf
>> assigns to $expr:
>> 
>> 		expr='"git".*"pack-objects".*'
>> 
>> and the actual grep command invoked becomes
>> 
>> 	grep '"event":"child_start".*\["git".*"pack-objects".*\]'
>> 
>> I am not sure if that is what we really want.
>
> Ah, yes this certainly seems to not be the expected plan. It does
> allow for more flexibility than intended: the intention was to
> add flexibility at the end of the command, but instead adds
> flexibility throughout, only caring that a certain list of options
> is present as a subsequence (except that the first item is the
> first item, namely "git" in most cases).

I guess I sent a response before reading this message from you.

> That unintended flexibility would allow the current needs to use
>
> 	test_subcommand_inexact ! git fetch
>
> as desired, but there is the additional worries about whether it
> is too flexible for the existing uses.

Yeah, it looked a bit too loose.

> If you think that we should fix the helper to work differently, then
> I can work on a patch to do so, so Abhradeep doesn't get too
> sidetracked on that.

I agree that comparing what _inexact does and what its inventor
wanted it to do and reconciling the differences would be outside
the scope of this topic, which means the test in this patch should
refrain from using the _inexact helper at all.

I found it quite a roundabout way to look into trace to see if
a "fetch" was run to determine if we are doing the right thing.

Regardless of whatever mechanism is used to lazily fetch objects
that have become necessary from the promisor remotes, what we want
to ensure is that the blob object HEAD:new-file.txt is still missing
in our object store after running "log --follow", isn't it?  In a
future version of "git", our on-demand lazy fetch mechanism may not
even invoke "git fetch" under the hood, after all.

Don't we have a more direct way to ask "does this object exist in
our object store, or is it merely left as a promise?" without
triggering a lazy fetching that we can use in this test?  I think
such a direct approach is what we want to use in this test.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-14 16:25     ` Derrick Stolee
  2022-03-14 21:42       ` Junio C Hamano
@ 2022-03-15  8:20       ` Abhradeep Chakraborty
  1 sibling, 0 replies; 16+ messages in thread
From: Abhradeep Chakraborty @ 2022-03-15  8:20 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty, git, Junio C Hamano, Johannes Schindelin

Derrick Stolee <derrickstolee@github.com> wrote:

> This reason is something that could be mentioned in the commit message
> to motivate the change to the helper.

Sure, definitely. Will add it.

> As I mentioned earlier, it seems that
>
>	test_subcommand_inexact ! git fetch
>
> would actually work for your needs without changing the helper. We will see
> whether or not the helper needs to be updated in a way that that line would
> not work anymore.

Okay. got it!

Thanks :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-14 22:21     ` Junio C Hamano
@ 2022-03-15 11:30       ` Abhradeep Chakraborty
  2022-03-15 12:57         ` Derrick Stolee
  0 siblings, 1 reply; 16+ messages in thread
From: Abhradeep Chakraborty @ 2022-03-15 11:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Abhradeep Chakraborty, git, Derrick Stolee, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> wrote:

> I found it quite a roundabout way to look into trace to see if
> a "fetch" was run to determine if we are doing the right thing.
>
> Regardless of whatever mechanism is used to lazily fetch objects
> that have become necessary from the promisor remotes, what we want
> to ensure is that the blob object HEAD:new-file.txt is still missing
> in our object store after running "log --follow", isn't it?  In a
> future version of "git", our on-demand lazy fetch mechanism may not
> even invoke "git fetch" under the hood, after all.
>
> Don't we have a more direct way to ask "does this object exist in
> our object store, or is it merely left as a promise?" without
> triggering a lazy fetching that we can use in this test?  I think
> such a direct approach is what we want to use in this test.

I did think of other ways to detect the downloading before. Initially I
thought of using grep functionality to see if any download related
messages are printed or not. But I found that `git log` doesn't print
any download related messages (e.g. like "enumerating ...."). So, I
dropped that idea.

The next idea that came into my mind was to detect if the previous file
(the file from where the new file is renamed) is still missing ( you're
suggesting the same approach). But the problem I faced with this apprach
is I didn't find a way to detect if the file is missing.

I tried to use `git rev-list --objects --missing=print` with `HEAD` and
first commit hash. But in both cases, I didn't found a missing `[?]` sign
before ` <blob-hash-ID> file.txt`. That means, both blob objects ( or I
think the same blob object) exists in the local repo.

Most probably, this is because the content of these two files are same.
Blob object is never modified. So, as far as I think, both base trees ( one
which is pointed by the initial commit and other which is pointed by the latest
commit) is pointing to the same blob object. As a result, the file is not
missing. ( I am not sure if git really does it as I said; sorry if I am
wrong here)

That's why I asked Derrick to help me detecting the download and he
suggested me this. It is true that this is not ideal (use of `fetch` may
change later). But this was the only option that I can go with.

If you find another/better way or if you think I tried to use `git rev-list
...` in the wrong way -  please tell me. I would be very happy.

Thanks :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-15 11:30       ` Abhradeep Chakraborty
@ 2022-03-15 12:57         ` Derrick Stolee
  2022-03-15 15:15           ` Abhradeep Chakraborty
  2022-03-15 16:13           ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Derrick Stolee @ 2022-03-15 12:57 UTC (permalink / raw)
  To: Abhradeep Chakraborty, Junio C Hamano; +Cc: git, Johannes Schindelin

On 3/15/2022 7:30 AM, Abhradeep Chakraborty wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> 
>> I found it quite a roundabout way to look into trace to see if
>> a "fetch" was run to determine if we are doing the right thing.
>>
>> Regardless of whatever mechanism is used to lazily fetch objects
>> that have become necessary from the promisor remotes, what we want
>> to ensure is that the blob object HEAD:new-file.txt is still missing
>> in our object store after running "log --follow", isn't it?  In a
>> future version of "git", our on-demand lazy fetch mechanism may not
>> even invoke "git fetch" under the hood, after all.
>>
>> Don't we have a more direct way to ask "does this object exist in
>> our object store, or is it merely left as a promise?" without
>> triggering a lazy fetching that we can use in this test?  I think
>> such a direct approach is what we want to use in this test.
> 
> I did think of other ways to detect the downloading before. Initially I
> thought of using grep functionality to see if any download related
> messages are printed or not. But I found that `git log` doesn't print
> any download related messages (e.g. like "enumerating ...."). So, I
> dropped that idea.
> 
> The next idea that came into my mind was to detect if the previous file
> (the file from where the new file is renamed) is still missing ( you're
> suggesting the same approach). But the problem I faced with this apprach
> is I didn't find a way to detect if the file is missing.
> 
> I tried to use `git rev-list --objects --missing=print` with `HEAD` and
> first commit hash. But in both cases, I didn't found a missing `[?]` sign
> before ` <blob-hash-ID> file.txt`. That means, both blob objects ( or I
> think the same blob object) exists in the local repo.

Ah! This method really should have worked. And the test as written
is not testing the right thing, because we also skip downloading the
blobs if we already have them.

I think the key issue is that your clone says this:

+	git clone --filter=blob:none "file://$(pwd)/repo" partial.git &&

which will do a checkout and download the blobs at tip.

If you add a "--bare" to this clone command, then no blobs should be
downloaded, and your rev-list command should show the missing objects.

> Most probably, this is because the content of these two files are same.
> Blob object is never modified. So, as far as I think, both base trees ( one
> which is pointed by the initial commit and other which is pointed by the latest
> commit) is pointing to the same blob object. As a result, the file is not
> missing. ( I am not sure if git really does it as I said; sorry if I am
> wrong here)
> 
> That's why I asked Derrick to help me detecting the download and he
> suggested me this. It is true that this is not ideal (use of `fetch` may
> change later). But this was the only option that I can go with.
> 
> If you find another/better way or if you think I tried to use `git rev-list
> ...` in the wrong way -  please tell me. I would be very happy.

Hopefully my suggestion to use --bare will help.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-15 12:57         ` Derrick Stolee
@ 2022-03-15 15:15           ` Abhradeep Chakraborty
  2022-03-15 16:13           ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Abhradeep Chakraborty @ 2022-03-15 15:15 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty, git, Junio C Hamano, Johannes Schindelin

Derrick Stolee <derrickstolee@github.com> wrote:

> Ah! This method really should have worked. And the test as written
> is not testing the right thing, because we also skip downloading the
> blobs if we already have them.
>
> I think the key issue is that your clone says this:
>
> +	git clone --filter=blob:none "file://$(pwd)/repo" partial.git &&
>
> which will do a checkout and download the blobs at tip.
>
> If you add a "--bare" to this clone command, then no blobs should be
> downloaded, and your rev-list command should show the missing objects.

Thanks Derrick! Let my try it.

Thanks :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-15 12:57         ` Derrick Stolee
  2022-03-15 15:15           ` Abhradeep Chakraborty
@ 2022-03-15 16:13           ` Junio C Hamano
  2022-03-16  8:06             ` Abhradeep Chakraborty
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2022-03-15 16:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, git, Johannes Schindelin

Derrick Stolee <derrickstolee@github.com> writes:

>> I tried to use `git rev-list --objects --missing=print` with `HEAD` and
>> first commit hash. But in both cases, I didn't found a missing `[?]` sign
>> before ` <blob-hash-ID> file.txt`. That means, both blob objects ( or I
>> think the same blob object) exists in the local repo.
> ...

Yup.  I was about to suggest --missing=allow-promisor to catch other
unexpected missing objects, but in this toy history for testing,
what is missing is all the objects expected from the promisor, so it
should be sufficient to use --missing=print.

> I think the key issue is that your clone says this:
>
> +	git clone --filter=blob:none "file://$(pwd)/repo" partial.git &&
>
> which will do a checkout and download the blobs at tip.
>
> If you add a "--bare" to this clone command, then no blobs should be
> downloaded, and your rev-list command should show the missing objects.

That sounds like pointing at a different issue.  If the test
repository downloads the blobs at the tip, then the fact that the
trace output did not have "fetch" in it does not mean much, does it?
It could be that we refrained from lazily download the blob because
we did not need its contents for the purpose of following through an
exact rename, but it could also be that we did not need to lazily
download it because we already had it.

> Hopefully my suggestion to use --bare will help.

Yup, thanks.

So regardless of "--missing=print" vs "grep in trace", there was a
bug in the test set-up, and we caught it in this discussion, which
is excellent.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] partial-clone: add a partial-clone test case
  2022-03-15 16:13           ` Junio C Hamano
@ 2022-03-16  8:06             ` Abhradeep Chakraborty
  0 siblings, 0 replies; 16+ messages in thread
From: Abhradeep Chakraborty @ 2022-03-16  8:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Abhradeep Chakraborty, git, Derrick Stolee, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> wrote:

> That sounds like pointing at a different issue.  If the test
> repository downloads the blobs at the tip, then the fact that the
> trace output did not have "fetch" in it does not mean much, does it?
> It could be that we refrained from lazily download the blob because
> we did not need its contents for the purpose of following through an
> exact rename, but it could also be that we did not need to lazily
> download it because we already had it.

hmm, you're right.

> So regardless of "--missing=print" vs "grep in trace", there was a
> bug in the test set-up, and we caught it in this discussion, which
> is excellent.

:)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] partial-clone: add a partial-clone test case
  2022-03-13 17:39 [PATCH] partial-clone: add a partial-clone test case Abhradeep Chakraborty via GitGitGadget
  2022-03-13 19:41 ` Junio C Hamano
@ 2022-03-16  9:46 ` Abhradeep Chakraborty via GitGitGadget
  2022-03-21 15:26   ` Derrick Stolee
  1 sibling, 1 reply; 16+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-03-16  9:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

In a blobless-cloned repo, `git log --follow -- <path>` (`<path>` have
an exact OID rename) shouldn't download blob of the file from where the
new file is renamed.

Add a test case to verify it.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
    partial-clone: add a partial-clone test case
    
    changes since v1:
    
     1. remove the event track method to detect the downloading as it is not
        future proof ( and buggy).
     2. Instead see if the file is missing initially and after running the
        git log --follow ... command.
    
    Fixes #827 [1]
    
    [1] https://github.com/gitgitgadget/git/issues/827

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1175%2FAbhra303%2Fcheck_partial_clone-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1175/Abhra303/check_partial_clone-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1175

Range-diff vs v1:

 1:  6b80b981977 ! 1:  52df433ad5e partial-clone: add a partial-clone test case
     @@ Commit message
          partial-clone: add a partial-clone test case
      
          In a blobless-cloned repo, `git log --follow -- <path>` (`<path>` have
     -    an exact OID rename) doesn't download blob of the file from where the
     +    an exact OID rename) shouldn't download blob of the file from where the
          new file is renamed.
      
          Add a test case to verify it.
     @@ t/t0410-partial-clone.sh: test_expect_success 'do not fetch when checking existe
       	git -C repo cherry-pick side1
       '
       
     -+test_expect_success 'git log --follow does not download blobs if an exact OID rename found (blobless clone)' '
     ++test_expect_success 'exact rename does not need to fetch the blob lazily' '
      +	rm -rf repo partial.git &&
      +	test_create_repo repo &&
      +	content="some dummy content" &&
      +	test_commit -C repo create-a-file file.txt "$content" &&
      +	git -C repo mv file.txt new-file.txt &&
      +	git -C repo commit -m rename-the-file &&
     ++	FILE_HASH=$(git -C repo rev-parse HEAD:new-file.txt) &&
      +	test_config -C repo uploadpack.allowfilter 1 &&
      +	test_config -C repo uploadpack.allowanysha1inwant 1 &&
      +
     -+	git clone --filter=blob:none "file://$(pwd)/repo" partial.git &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
     -+		git -C partial.git log --follow -- new-file.txt > "$(pwd)/trace.txt" &&
     -+	! test_subcommand_inexact fetch <trace.txt
     ++	git clone --filter=blob:none --bare "file://$(pwd)/repo" partial.git &&
     ++	git -C partial.git rev-list --objects --missing=print HEAD >out &&
     ++	grep "[?]$FILE_HASH" out &&
     ++	git -C partial.git log --follow -- new-file.txt &&
     ++	git -C partial.git rev-list --objects --missing=print HEAD >out &&
     ++	grep "[?]$FILE_HASH" out
      +'
      +
       test_expect_success 'lazy-fetch when accessing object not in the_repository' '
       	rm -rf full partial.git &&
       	test_create_repo full &&
     -
     - ## t/test-lib-functions.sh ##
     -@@ t/test-lib-functions.sh: test_subcommand_inexact () {
     - 		shift
     - 	fi
     - 
     --	local expr=$(printf '"%s".*' "$@")
     -+	local expr=$(printf '.*"%s".*' "$@")
     - 	expr="${expr%,}"
     - 
     - 	if test -n "$negate"


 t/t0410-partial-clone.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index f17abd298c8..1e864cf3172 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -618,6 +618,25 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
 	git -C repo cherry-pick side1
 '
 
+test_expect_success 'exact rename does not need to fetch the blob lazily' '
+	rm -rf repo partial.git &&
+	test_create_repo repo &&
+	content="some dummy content" &&
+	test_commit -C repo create-a-file file.txt "$content" &&
+	git -C repo mv file.txt new-file.txt &&
+	git -C repo commit -m rename-the-file &&
+	FILE_HASH=$(git -C repo rev-parse HEAD:new-file.txt) &&
+	test_config -C repo uploadpack.allowfilter 1 &&
+	test_config -C repo uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:none --bare "file://$(pwd)/repo" partial.git &&
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	grep "[?]$FILE_HASH" out &&
+	git -C partial.git log --follow -- new-file.txt &&
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	grep "[?]$FILE_HASH" out
+'
+
 test_expect_success 'lazy-fetch when accessing object not in the_repository' '
 	rm -rf full partial.git &&
 	test_create_repo full &&

base-commit: 1a4874565fa3b6668042216189551b98b4dc0b1b
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] partial-clone: add a partial-clone test case
  2022-03-16  9:46 ` [PATCH v2] " Abhradeep Chakraborty via GitGitGadget
@ 2022-03-21 15:26   ` Derrick Stolee
  0 siblings, 0 replies; 16+ messages in thread
From: Derrick Stolee @ 2022-03-21 15:26 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Junio C Hamano, Johannes Schindelin, Abhradeep Chakraborty

On 3/16/2022 5:46 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> In a blobless-cloned repo, `git log --follow -- <path>` (`<path>` have
> an exact OID rename) shouldn't download blob of the file from where the
> new file is renamed.
> 
> Add a test case to verify it.
> 
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>     partial-clone: add a partial-clone test case
>     
>     changes since v1:
>     
>      1. remove the event track method to detect the downloading as it is not
>         future proof ( and buggy).
>      2. Instead see if the file is missing initially and after running the
>         git log --follow ... command.

> +test_expect_success 'exact rename does not need to fetch the blob lazily' '
> +	rm -rf repo partial.git &&
> +	test_create_repo repo &&
> +	content="some dummy content" &&
> +	test_commit -C repo create-a-file file.txt "$content" &&
> +	git -C repo mv file.txt new-file.txt &&
> +	git -C repo commit -m rename-the-file &&
> +	FILE_HASH=$(git -C repo rev-parse HEAD:new-file.txt) &&
> +	test_config -C repo uploadpack.allowfilter 1 &&
> +	test_config -C repo uploadpack.allowanysha1inwant 1 &&
> +
> +	git clone --filter=blob:none --bare "file://$(pwd)/repo" partial.git &&
> +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +	grep "[?]$FILE_HASH" out &&
> +	git -C partial.git log --follow -- new-file.txt &&
> +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +	grep "[?]$FILE_HASH" out
> +'
> +

Thanks for taking the feedback and turning it into a good test!

I'm adding it as a TODO to go fix the buggy helper as discussed.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-03-21 15:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-13 17:39 [PATCH] partial-clone: add a partial-clone test case Abhradeep Chakraborty via GitGitGadget
2022-03-13 19:41 ` Junio C Hamano
2022-03-14 15:46   ` Abhradeep Chakraborty
2022-03-14 16:25     ` Derrick Stolee
2022-03-14 21:42       ` Junio C Hamano
2022-03-15  8:20       ` Abhradeep Chakraborty
2022-03-14 21:35     ` Junio C Hamano
2022-03-14 16:24   ` Derrick Stolee
2022-03-14 22:21     ` Junio C Hamano
2022-03-15 11:30       ` Abhradeep Chakraborty
2022-03-15 12:57         ` Derrick Stolee
2022-03-15 15:15           ` Abhradeep Chakraborty
2022-03-15 16:13           ` Junio C Hamano
2022-03-16  8:06             ` Abhradeep Chakraborty
2022-03-16  9:46 ` [PATCH v2] " Abhradeep Chakraborty via GitGitGadget
2022-03-21 15:26   ` Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).