git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Slightly simplify partial clone user experience
@ 2020-03-19 17:28 Derrick Stolee via GitGitGadget
  2020-03-19 17:28 ` [PATCH 1/2] partial-clone: set default filter with --partial Derrick Stolee via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-19 17:28 UTC (permalink / raw)
  To: git; +Cc: me, jonathantanmy, christian.couder, git, Derrick Stolee

This was something discussed briefly at the contributor summit: users will
have a hard time remembering git clone --filter=blob:none <url>. This series
simply adds a --partial option that is equivalent to --filter=blob:none,
with the ability to specify a size using --partial=<size> that is equivalent
to --filter=blob:limit=<size>.

While going to the git clone documentation to add this option, I noticed
that there is no reference to the --filter option there, nor any discussion
of partial clone. I resolved the former.

REQUEST FOR HELP: If anyone out there is looking for an opportunity to
contribute documentation, I feel there is a need to include a section in the 
git clone documentation devoted to partial clone. Hopefully such a section
would include answers to these questions:

 1. What is a partial clone?
 2. Why would I want a partial clone?
 3. How do I need to be careful when using a partial clone?

I don't have time to write such a document right now, which is why I'm
asking for help if anyone is looking for a non-code way to contribute to the
project.

Thanks, -Stolee

Derrick Stolee (2):
  partial-clone: set default filter with --partial
  clone: document --partial and --filter options

 Documentation/git-clone.txt   | 15 ++++++++++++-
 list-objects-filter-options.c | 18 +++++++++++++++
 list-objects-filter-options.h |  8 ++++++-
 t/t5616-partial-clone.sh      | 42 ++++++++++++++++++++++++++---------
 4 files changed, 71 insertions(+), 12 deletions(-)


base-commit: 6c85aac65fb455af85745130ce35ddae4678db84
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-586%2Fderrickstolee%2Fpartial-clone-ux-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-586/derrickstolee/partial-clone-ux-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/586
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] partial-clone: set default filter with --partial
  2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
@ 2020-03-19 17:28 ` Derrick Stolee via GitGitGadget
  2020-03-20 20:26   ` Junio C Hamano
  2020-03-19 17:28 ` [PATCH 2/2] clone: document --partial and --filter options Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-19 17:28 UTC (permalink / raw)
  To: git
  Cc: me, jonathantanmy, christian.couder, git, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Partial clone means a lot of different things, including filtering
out all blobs, large blobs, or objects along a certain pathspec.
The pathspec option has limited uses, in particular due to the
performance challenges in serving with such filters. The blob size
option can be helpful for repositories with a small number of large
binaries, but otherwise it is difficult to find a meaningful split
between "small" and "large" blobs.

When I think of or recommend partial clone, I specifically mention
the case of filtering out all blobs, and downloading those blobs
only as needed.

This case is extremely useful, since it takes the best part of
shallow clone (a very small initial download) without any of the
downsides of restricted history.

However, the command-line interface can be confusing:

	git clone --filter=blob:none <url>

Add a simpler "--partial" option that defaults to this case:

	git clone --partial <url>

This should make the feature more discoverable. However, there is
a significant interest in the size-limited filters as that behaves
very similarly to Git LFS. For those cases, the following is
available:

	git clone --partial=<size> <url>

There are quite a few commands using OPT_PARSE_LIST_OBJECTS_FILTER,
includeing clone, fetch, and pack-objects. Augment this macro to
include the "--partial[=<size>]" mode for free.

Modify the first partial clone test that checks --filter=blob:none
to also check --partial with the same expected conditions after
clone. The diff is much simpler to see when ignoring whitespace,
since several lines added a leading tab. This test is essentially
copied to include the two ways to specify a blob size limit of
one byte.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 list-objects-filter-options.c | 18 +++++++++++++++
 list-objects-filter-options.h |  8 ++++++-
 t/t5616-partial-clone.sh      | 42 ++++++++++++++++++++++++++---------
 3 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 256bcfbdfe6..a71716ef75e 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -270,6 +270,24 @@ int opt_parse_list_objects_filter(const struct option *opt,
 	return 0;
 }
 
+int opt_set_blob_none_filter(const struct option *opt,
+			     const char *arg, int unset)
+{
+	struct strbuf filter_arg = STRBUF_INIT;
+	struct list_objects_filter_options *filter_options = opt->value;
+	
+	if (unset || !arg || !strcmp(arg, "0")) {
+		parse_list_objects_filter(filter_options, "blob:none");
+		return 0;
+	}
+	
+	strbuf_addf(&filter_arg, "blob:limit=%s", arg);
+	parse_list_objects_filter(filter_options, filter_arg.buf);
+	strbuf_release(&filter_arg);
+
+	return 0;
+}
+
 const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
 {
 	if (!filter->filter_spec.nr)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 2ffb39222c4..ac38ffcbe86 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -62,6 +62,7 @@ struct list_objects_filter_options {
 
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
+#define CL_ARG__PARTIAL "partial"
 
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);
@@ -80,11 +81,16 @@ void parse_list_objects_filter(
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
+int opt_set_blob_none_filter(const struct option *opt,
+			     const char *arg, int unset);
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
 	{ OPTION_CALLBACK, 0, CL_ARG__FILTER, fo, N_("args"), \
 	  N_("object filtering"), 0, \
-	  opt_parse_list_objects_filter }
+	  opt_parse_list_objects_filter }, \
+	{ OPTION_CALLBACK, 0, CL_ARG__PARTIAL, fo, N_("size"), \
+	  N_("partial clone with blob filter"), \
+	  PARSE_OPT_OPTARG | PARSE_OPT_NONEG , opt_set_blob_none_filter }
 
 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 77bb91e9769..c42cef61296 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -33,17 +33,39 @@ test_expect_success 'setup bare clone for server' '
 # confirm we are missing all of the known blobs.
 # confirm partial clone was registered in the local config.
 test_expect_success 'do partial clone 1' '
-	git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 &&
-
-	git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
-	awk -f print_1.awk revs |
-	sed "s/?//" |
-	sort >observed.oids &&
+	for option in "--filter=blob:none" "--partial"
+	do
+		rm -rf pc1 &&
+		git clone --no-checkout "$option" "file://$(pwd)/srv.bare" pc1 &&
+
+		git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
+		awk -f print_1.awk revs |
+		sed "s/?//" |
+		sort >observed.oids &&
+
+		test_cmp expect_1.oids observed.oids &&
+		test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
+		test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
+		test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
+	done
+'
 
-	test_cmp expect_1.oids observed.oids &&
-	test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
-	test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
-	test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
+test_expect_success 'do partial clone with size limit' '
+	for option in "--filter=blob:limit=1" "--partial=1"
+	do
+		rm -rf pc-limit &&
+		git clone --no-checkout "$option" "file://$(pwd)/srv.bare" pc-limit &&
+
+		git -C pc-limit rev-list --quiet --objects --missing=print HEAD >revs &&
+		awk -f print_1.awk revs |
+		sed "s/?//" |
+		sort >observed.oids &&
+
+		test_cmp expect_1.oids observed.oids &&
+		test "$(git -C pc-limit config --local core.repositoryformatversion)" = "1" &&
+		test "$(git -C pc-limit config --local remote.origin.promisor)" = "true" &&
+		test "$(git -C pc-limit config --local remote.origin.partialclonefilter)" = "blob:limit=1"
+	done
 '
 
 test_expect_success 'verify that .promisor file contains refs fetched' '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] clone: document --partial and --filter options
  2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
  2020-03-19 17:28 ` [PATCH 1/2] partial-clone: set default filter with --partial Derrick Stolee via GitGitGadget
@ 2020-03-19 17:28 ` Derrick Stolee via GitGitGadget
  2020-03-20 12:27 ` [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-19 17:28 UTC (permalink / raw)
  To: git
  Cc: me, jonathantanmy, christian.couder, git, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The previous change added the "--partial[=<size>]" option to
"git clone" equivalent to "--filter=blob:none" or
"--filter=blob:limit=<size>" but did not document that addition.
It turns out that the "--filter=<filter-spec>" option was not
documented anywhere in the "git clone" page, and instead is
detailed carefully in "git rev-list" where it serves a
different purpose.

Add a small bit about these options in the documentation. It
would be worth some time to create a subsection in the "git clone"
documentation about partial clone as a concept and how it can be
a surprising experience. For example, "git checkout" will likely
trigger a pack download.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-clone.txt | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index bf24f1813ad..eafa1c39927 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -15,7 +15,8 @@ SYNOPSIS
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
 	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
-	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
+	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
+	  [--partial[=<size>]|--filter=<filter>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -162,6 +163,18 @@ objects from the source repository into a pack in the cloned repository.
 	of the repository. The sparse-checkout file can be
 	modified to grow the working directory as needed.
 
+--partial[=<size>]::
+--filter=<filter-spec>::
+	Use the partial clone feature and request that the server sends
+	a subset of reachable objects according to a given object filter.
+	When using `--filter`, the supplied `<filter-spec>` is used for
+	the partial clone filter. When using `--partial` with no `<size>`,
+	the `blob:none` filter is applied to filter all blobs. When using
+	`--partial=<size>` the `blob:limit=<size>` filter is applied to
+	filter all blobs with size larger than `<size>`. For more details
+	on filter specifications, see the `--filter` option in
+	linkgit:git-rev-list[1].
+
 --mirror::
 	Set up a mirror of the source repository.  This implies `--bare`.
 	Compared to `--bare`, `--mirror` not only maps local branches of the
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Slightly simplify partial clone user experience
  2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
  2020-03-19 17:28 ` [PATCH 1/2] partial-clone: set default filter with --partial Derrick Stolee via GitGitGadget
  2020-03-19 17:28 ` [PATCH 2/2] clone: document --partial and --filter options Derrick Stolee via GitGitGadget
@ 2020-03-20 12:27 ` Derrick Stolee
  2020-03-22  9:51 ` Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Derrick Stolee @ 2020-03-20 12:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: me, jonathantanmy, christian.couder, git, Derrick Stolee

On 3/19/2020 1:28 PM, Derrick Stolee via GitGitGadget wrote:
> REQUEST FOR HELP: If anyone out there is looking for an opportunity to
> contribute documentation, I feel there is a need to include a section in the 
> git clone documentation devoted to partial clone. Hopefully such a section
> would include answers to these questions:
> 
>  1. What is a partial clone?
>  2. Why would I want a partial clone?
>  3. How do I need to be careful when using a partial clone?
> 
> I don't have time to write such a document right now, which is why I'm
> asking for help if anyone is looking for a non-code way to contribute to the
> project.

I reached out on Twitter to see if anyone was interested, and I got a
response. I'm working with them to get started on the mailing list. I'll
send a proper introduction when I know which email address they plan to
use on the list.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] partial-clone: set default filter with --partial
  2020-03-19 17:28 ` [PATCH 1/2] partial-clone: set default filter with --partial Derrick Stolee via GitGitGadget
@ 2020-03-20 20:26   ` Junio C Hamano
  2020-03-20 20:38     ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-03-20 20:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, jonathantanmy, christian.couder, git, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index 256bcfbdfe6..a71716ef75e 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -270,6 +270,24 @@ int opt_parse_list_objects_filter(const struct option *opt,
>  	return 0;
>  }
>  
> +int opt_set_blob_none_filter(const struct option *opt,
> +			     const char *arg, int unset)

Isn't the convention to use "opt_parse_" for canned parse-options
callbacks?

> +{
> +	struct strbuf filter_arg = STRBUF_INIT;
> +	struct list_objects_filter_options *filter_options = opt->value;
> +	
> +	if (unset || !arg || !strcmp(arg, "0")) {
> +		parse_list_objects_filter(filter_options, "blob:none");
> +		return 0;

If "--no-partial" were allowed, it should be usable to countermand
"--partial" earlier on the command line or perhaps configured
default.  But the above (especially the "unset ||" part) makes
"--no-partial" a synonym to "--filter=blob:none", no?

> +	}
> +	
> +	strbuf_addf(&filter_arg, "blob:limit=%s", arg);
> +	parse_list_objects_filter(filter_options, filter_arg.buf);
> +	strbuf_release(&filter_arg);

I would have expected the body of the function to read more like
this:

	if (unset) {
        	... clear filter_options stored in opt->value ...
	} else {
		struct strbuf filter_arg = STRBUF_INIT;
		if (!arg)
			strbuf_addstr(&filter_arg, "blob:none");
		else
			strbuf_addf(&filter_arg, "blob:limit=%s", arg);
		parse_list_objects_filter(filter_options, filter_arg.buf);
		strbuf_release(&filter_arg);
	}

Specifically, I find it unsatisifying to see the "0" optimization at
this level.  Shouldn't it be done in parse_list_objects_filter() to
parse "blob:limit=<num>" and after realizing <num> is zero, pretend
as if it got "blob:none" to optimize?  That way, people can even say
"--partial=0k" and get it interpreted as "--filter=blob:none", right?

>  #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
>  	{ OPTION_CALLBACK, 0, CL_ARG__FILTER, fo, N_("args"), \
>  	  N_("object filtering"), 0, \
> -	  opt_parse_list_objects_filter }
> +	  opt_parse_list_objects_filter }, \
> +	{ OPTION_CALLBACK, 0, CL_ARG__PARTIAL, fo, N_("size"), \
> +	  N_("partial clone with blob filter"), \
> +	  PARSE_OPT_OPTARG | PARSE_OPT_NONEG , opt_set_blob_none_filter }

PARSE_OPT_NONEG means "--no-partial" is forbidden and the callback
won't see unset==1 at all, right?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] partial-clone: set default filter with --partial
  2020-03-20 20:26   ` Junio C Hamano
@ 2020-03-20 20:38     ` Derrick Stolee
  2020-03-22  9:46       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2020-03-20 20:38 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, jonathantanmy, christian.couder, git, Derrick Stolee



On 3/20/2020 4:26 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
>> index 256bcfbdfe6..a71716ef75e 100644
>> --- a/list-objects-filter-options.c
>> +++ b/list-objects-filter-options.c
>> @@ -270,6 +270,24 @@ int opt_parse_list_objects_filter(const struct option *opt,
>>  	return 0;
>>  }
>>  
>> +int opt_set_blob_none_filter(const struct option *opt,
>> +			     const char *arg, int unset)
> 
> Isn't the convention to use "opt_parse_" for canned parse-options
> callbacks?

I can do that.

>> +{
>> +	struct strbuf filter_arg = STRBUF_INIT;
>> +	struct list_objects_filter_options *filter_options = opt->value;
>> +	
>> +	if (unset || !arg || !strcmp(arg, "0")) {
>> +		parse_list_objects_filter(filter_options, "blob:none");
>> +		return 0;
> 
> If "--no-partial" were allowed, it should be usable to countermand
> "--partial" earlier on the command line or perhaps configured
> default.  But the above (especially the "unset ||" part) makes
> "--no-partial" a synonym to "--filter=blob:none", no?

I should have been more careful about the use of "unset" (which
would never be true with the current option parsing).

>> +	}
>> +	
>> +	strbuf_addf(&filter_arg, "blob:limit=%s", arg);
>> +	parse_list_objects_filter(filter_options, filter_arg.buf);
>> +	strbuf_release(&filter_arg);
> 
> I would have expected the body of the function to read more like
> this:
> 
> 	if (unset) {
>         	... clear filter_options stored in opt->value ...
> 	} else {
> 		struct strbuf filter_arg = STRBUF_INIT;
> 		if (!arg)
> 			strbuf_addstr(&filter_arg, "blob:none");
> 		else
> 			strbuf_addf(&filter_arg, "blob:limit=%s", arg);
> 		parse_list_objects_filter(filter_options, filter_arg.buf);
> 		strbuf_release(&filter_arg);
> 	}

This is a better organization and I will use it.

> Specifically, I find it unsatisifying to see the "0" optimization at
> this level.  Shouldn't it be done in parse_list_objects_filter() to
> parse "blob:limit=<num>" and after realizing <num> is zero, pretend
> as if it got "blob:none" to optimize?  That way, people can even say
> "--partial=0k" and get it interpreted as "--filter=blob:none", right?

I suppose it would be worth checking the recent server-side improvements
to see if they translate a limit=0k to a "size 0" and then ignore the
size check and simply remove all blobs.

>>  #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
>>  	{ OPTION_CALLBACK, 0, CL_ARG__FILTER, fo, N_("args"), \
>>  	  N_("object filtering"), 0, \
>> -	  opt_parse_list_objects_filter }
>> +	  opt_parse_list_objects_filter }, \
>> +	{ OPTION_CALLBACK, 0, CL_ARG__PARTIAL, fo, N_("size"), \
>> +	  N_("partial clone with blob filter"), \
>> +	  PARSE_OPT_OPTARG | PARSE_OPT_NONEG , opt_set_blob_none_filter }
> 
> PARSE_OPT_NONEG means "--no-partial" is forbidden and the callback
> won't see unset==1 at all, right?

You're right. I'm being inconsistent. Your reasons above point to a
good reason to have --no-partial available.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] partial-clone: set default filter with --partial
  2020-03-20 20:38     ` Derrick Stolee
@ 2020-03-22  9:46       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2020-03-22  9:46 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, me,
	jonathantanmy, christian.couder, git, Derrick Stolee

On Fri, Mar 20, 2020 at 04:38:27PM -0400, Derrick Stolee wrote:

> > Specifically, I find it unsatisifying to see the "0" optimization at
> > this level.  Shouldn't it be done in parse_list_objects_filter() to
> > parse "blob:limit=<num>" and after realizing <num> is zero, pretend
> > as if it got "blob:none" to optimize?  That way, people can even say
> > "--partial=0k" and get it interpreted as "--filter=blob:none", right?
> 
> I suppose it would be worth checking the recent server-side improvements
> to see if they translate a limit=0k to a "size 0" and then ignore the
> size check and simply remove all blobs.

The new bitmap code doesn't do anything special there. It does rely on
the normal filter parsing, though. If we rewrote it at that level,
perhaps like this (completely untested):

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 256bcfbdfe..0225b61912 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,7 +49,10 @@ static int gently_parse_list_objects_filter(
 
 	} else if (skip_prefix(arg, "blob:limit=", &v0)) {
 		if (git_parse_ulong(v0, &filter_options->blob_limit_value)) {
-			filter_options->choice = LOFC_BLOB_LIMIT;
+			if (filter_options->blob_limit_value)
+				filter_options->choice = LOFC_BLOB_LIMIT;
+			else
+				filter_options->choice = LOFC_BLOB_NONE;
 			return 0;
 		}
 

then it would just work for regular and bitmapped filters.

One interesting user-visible effect would be in the patches we've been
discussing to limit which filters are allowed. If you allowed, say,
blob:none but not blob:limit, this would quietly allow blob:limit=0
(because it really _is_ blob:none under the hood). 

I'm not sure if that would be confusing or convenient. I doubt anybody
cares much for the blob filters (you'd either enable neither or both,
because they cost about the same). But in another thread, I mentioned
that doing "tree:depth=0" _would_ be cheap to do with bitmaps, but
tree:depth=1" wouldn't. If we quietly rewrote the former to tree:none
(which doesn't yet exist, but could), that would let you distinguish
between the two (of course if tree:none existed, perhaps nobody would
have any reason to write tree:depth=0 in the first place).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Slightly simplify partial clone user experience
  2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-03-20 12:27 ` [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee
@ 2020-03-22  9:51 ` Jeff King
  2020-03-22 10:58   ` Christian Couder
  2020-03-22 16:03 ` Taylor Blau
  2020-03-22 19:50 ` [PATCH v2] clone: document --filter options Derrick Stolee via GitGitGadget
  5 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2020-03-22  9:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, jonathantanmy, christian.couder, git, Derrick Stolee

On Thu, Mar 19, 2020 at 05:28:05PM +0000, Derrick Stolee via GitGitGadget wrote:

> This was something discussed briefly at the contributor summit: users will
> have a hard time remembering git clone --filter=blob:none <url>. This series
> simply adds a --partial option that is equivalent to --filter=blob:none,
> with the ability to specify a size using --partial=<size> that is equivalent
> to --filter=blob:limit=<size>.

I have mixed feelings on this. I do like making things less arcane for
users. But are we locking in a behavior for --partial that we might not
want to live with forever? I.e., the current thinking for partial clones
is to fetch no blobs at all, get all commits and trees, apply sparse
filters, and then fault in the blobs we need. But imagine we later grow
the ability to easily avoid fetching all of the trees. Would we regret
having the simple name "--partial" taken?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Slightly simplify partial clone user experience
  2020-03-22  9:51 ` Jeff King
@ 2020-03-22 10:58   ` Christian Couder
  2020-03-22 16:45     ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Couder @ 2020-03-22 10:58 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, Taylor Blau, Jonathan Tan,
	Jeff Hostetler, Derrick Stolee

On Sun, Mar 22, 2020 at 10:51 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Mar 19, 2020 at 05:28:05PM +0000, Derrick Stolee via GitGitGadget wrote:
>
> > This was something discussed briefly at the contributor summit: users will
> > have a hard time remembering git clone --filter=blob:none <url>. This series
> > simply adds a --partial option that is equivalent to --filter=blob:none,
> > with the ability to specify a size using --partial=<size> that is equivalent
> > to --filter=blob:limit=<size>.
>
> I have mixed feelings on this. I do like making things less arcane for
> users. But are we locking in a behavior for --partial that we might not
> want to live with forever? I.e., the current thinking for partial clones
> is to fetch no blobs at all, get all commits and trees, apply sparse
> filters, and then fault in the blobs we need. But imagine we later grow
> the ability to easily avoid fetching all of the trees. Would we regret
> having the simple name "--partial" taken?

I agree with that. Something like "--filter-blobs" for
"--filter=blob:none" and perhaps "--filter-blobs=<size>" for
"--filter=blob:limit=<size>" might be worth it though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Slightly simplify partial clone user experience
  2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-03-22  9:51 ` Jeff King
@ 2020-03-22 16:03 ` Taylor Blau
  2020-03-22 19:50 ` [PATCH v2] clone: document --filter options Derrick Stolee via GitGitGadget
  5 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2020-03-22 16:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, jonathantanmy, christian.couder, git, Derrick Stolee

On Thu, Mar 19, 2020 at 05:28:05PM +0000, Derrick Stolee via GitGitGadget wrote:
> This was something discussed briefly at the contributor summit: users will
> have a hard time remembering git clone --filter=blob:none <url>.

I think that this is probably my fault :).

I was saying at the contributor summit that I think I may have created
more confusion than clarity when I was describing how to use partial
clones in my last release blog post.

In that post, I introduced partial clones by asking and answering the
following series of four questions.

  1. What do you do when your repository is too large? Partial clones.
  2. How do you only fetch down some of the objects? Object filters.
  3. How do you restrict which parts of your working copy are populated?
     Write to '.git/info/sparse-checkout'.
  4. Simplify things even further by using 'git sparse-checkout' instead.

I think that some of the criticism was along the lines of "you said that
partial clones would be easy, but nothing in the steps 2-4 is
straightforward".

I think that things would have been much clearer if I said:

  If your repository is too large to clone all at once, or has a
  checkout footprint that is too big, you can issue a partial clone
  against it by writing the following:

    $ git clone --filter=blob:none --sparse /path/to/your/repo

> This series simply adds a --partial option that is equivalent to
> --filter=blob:none, with the ability to specify a size using
> --partial=<size> that is equivalent to --filter=blob:limit=<size>.

I think that I have mixed feelings about this, too. On the one hand, it
irrefutably makes the above incantation a lot easier to write out and
remember. On the other hand, I worry about it locking in options that we
don't want to keep around forever, like adding more filters, changing
some rules about sparse checkout and so on.

Maybe these concerns aren't well-founded, and that we're not likely to
change either of those anytime soon. But, I'd rather be wrong and not be
squatting '--partial[=<n>]' than not.

> While going to the git clone documentation to add this option, I noticed
> that there is no reference to the --filter option there, nor any discussion
> of partial clone. I resolved the former.
>
> REQUEST FOR HELP: If anyone out there is looking for an opportunity to
> contribute documentation, I feel there is a need to include a section in the
> git clone documentation devoted to partial clone. Hopefully such a section
> would include answers to these questions:
>
>  1. What is a partial clone?
>  2. Why would I want a partial clone?
>  3. How do I need to be careful when using a partial clone?
>
> I don't have time to write such a document right now, which is why I'm
> asking for help if anyone is looking for a non-code way to contribute to the
> project.

This seems like a worthy goal outside of this series, and I would
certainly love to have some more documentation in this area.

> Thanks, -Stolee
>
> Derrick Stolee (2):
>   partial-clone: set default filter with --partial
>   clone: document --partial and --filter options
>
>  Documentation/git-clone.txt   | 15 ++++++++++++-
>  list-objects-filter-options.c | 18 +++++++++++++++
>  list-objects-filter-options.h |  8 ++++++-
>  t/t5616-partial-clone.sh      | 42 ++++++++++++++++++++++++++---------
>  4 files changed, 71 insertions(+), 12 deletions(-)
>
>
> base-commit: 6c85aac65fb455af85745130ce35ddae4678db84
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-586%2Fderrickstolee%2Fpartial-clone-ux-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-586/derrickstolee/partial-clone-ux-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/586
> --
> gitgitgadget

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Slightly simplify partial clone user experience
  2020-03-22 10:58   ` Christian Couder
@ 2020-03-22 16:45     ` Derrick Stolee
  2020-03-22 19:22       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2020-03-22 16:45 UTC (permalink / raw)
  To: Christian Couder, Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, Taylor Blau, Jonathan Tan,
	Jeff Hostetler, Derrick Stolee

On 3/22/2020 6:58 AM, Christian Couder wrote:
> On Sun, Mar 22, 2020 at 10:51 AM Jeff King <peff@peff.net> wrote:
>>
>> On Thu, Mar 19, 2020 at 05:28:05PM +0000, Derrick Stolee via GitGitGadget wrote:
>>
>>> This was something discussed briefly at the contributor summit: users will
>>> have a hard time remembering git clone --filter=blob:none <url>. This series
>>> simply adds a --partial option that is equivalent to --filter=blob:none,
>>> with the ability to specify a size using --partial=<size> that is equivalent
>>> to --filter=blob:limit=<size>.
>>
>> I have mixed feelings on this. I do like making things less arcane for
>> users. But are we locking in a behavior for --partial that we might not
>> want to live with forever? I.e., the current thinking for partial clones
>> is to fetch no blobs at all, get all commits and trees, apply sparse
>> filters, and then fault in the blobs we need. But imagine we later grow
>> the ability to easily avoid fetching all of the trees. Would we regret
>> having the simple name "--partial" taken?
> 
> I agree with that. Something like "--filter-blobs" for
> "--filter=blob:none" and perhaps "--filter-blobs=<size>" for
> "--filter=blob:limit=<size>" might be worth it though.

Thanks for the perspective on this. The --filter-blobs[=<size>] should
be less likely to collide with an alternative definition of "partial".

While we are thinking in this space, what if we had a "partial-clone"
builtin? It could be a light wrapper around "git clone" where

  git partial-clone [--limit=<size>] [options] <url> [<dir>]

would do the same thing as

  git clone --filter=blob:[none|limit=<size>] [options] <url> [<dir>]

Just spit-balling here.

In the meantime, I'll work to adjust my patches to only be the
documentation of the --filter option.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Slightly simplify partial clone user experience
  2020-03-22 16:45     ` Derrick Stolee
@ 2020-03-22 19:22       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2020-03-22 19:22 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Christian Couder, Derrick Stolee via GitGitGadget, git,
	Taylor Blau, Jonathan Tan, Jeff Hostetler, Derrick Stolee

On Sun, Mar 22, 2020 at 12:45:29PM -0400, Derrick Stolee wrote:

> >> I have mixed feelings on this. I do like making things less arcane for
> >> users. But are we locking in a behavior for --partial that we might not
> >> want to live with forever? I.e., the current thinking for partial clones
> >> is to fetch no blobs at all, get all commits and trees, apply sparse
> >> filters, and then fault in the blobs we need. But imagine we later grow
> >> the ability to easily avoid fetching all of the trees. Would we regret
> >> having the simple name "--partial" taken?
> > 
> > I agree with that. Something like "--filter-blobs" for
> > "--filter=blob:none" and perhaps "--filter-blobs=<size>" for
> > "--filter=blob:limit=<size>" might be worth it though.
> 
> Thanks for the perspective on this. The --filter-blobs[=<size>] should
> be less likely to collide with an alternative definition of "partial".

Yes, though I wonder if it's really that big an improvement over
"--filter=blob:none".

> While we are thinking in this space, what if we had a "partial-clone"
> builtin? It could be a light wrapper around "git clone" where
> 
>   git partial-clone [--limit=<size>] [options] <url> [<dir>]
> 
> would do the same thing as
> 
>   git clone --filter=blob:[none|limit=<size>] [options] <url> [<dir>]
> 
> Just spit-balling here.

I think that just shifts my concern out to the other command; i.e.,
would we later want the partial-clone command to behave differently.
And introducing a wrapper creates a weird non-orthogonality where it's
not necessary. I _do_ think a wrapper could make sense when it's doing
substantially different things, or multiple steps that don't belong in
the original program (e.g., the way Scalar works).

> In the meantime, I'll work to adjust my patches to only be the
> documentation of the --filter option.

Thanks, that seems like an obvious and easy improvement.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] clone: document --filter options
  2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-03-22 16:03 ` Taylor Blau
@ 2020-03-22 19:50 ` Derrick Stolee via GitGitGadget
  2020-03-24  3:40   ` Jeff King
  2020-03-24  5:17   ` Jonathan Nieder
  5 siblings, 2 replies; 15+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-22 19:50 UTC (permalink / raw)
  To: git
  Cc: me, jonathantanmy, christian.couder, git, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It turns out that the "--filter=<filter-spec>" option is not
documented anywhere in the "git clone" page, and instead is
detailed carefully in "git rev-list" where it serves a
different purpose.

Add a small bit about this option in the documentation. It
would be worth some time to create a subsection in the "git clone"
documentation about partial clone as a concept and how it can be
a surprising experience. For example, "git checkout" will likely
trigger a pack download.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    Slightly simplify partial clone user experience
    
    V2: Only update the documentation of --filter.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-586%2Fderrickstolee%2Fpartial-clone-ux-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-586/derrickstolee/partial-clone-ux-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/586

Range-diff vs v1:

 1:  6f340d9aadf < -:  ----------- partial-clone: set default filter with --partial
 2:  9baf4c8ba38 ! 1:  a55c2d975ab clone: document --partial and --filter options
     @@ -1,16 +1,13 @@
      Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    clone: document --partial and --filter options
     +    clone: document --filter options
      
     -    The previous change added the "--partial[=<size>]" option to
     -    "git clone" equivalent to "--filter=blob:none" or
     -    "--filter=blob:limit=<size>" but did not document that addition.
     -    It turns out that the "--filter=<filter-spec>" option was not
     +    It turns out that the "--filter=<filter-spec>" option is not
          documented anywhere in the "git clone" page, and instead is
          detailed carefully in "git rev-list" where it serves a
          different purpose.
      
     -    Add a small bit about these options in the documentation. It
     +    Add a small bit about this option in the documentation. It
          would be worth some time to create a subsection in the "git clone"
          documentation about partial clone as a concept and how it can be
          a surprising experience. For example, "git checkout" will likely
     @@ -27,7 +24,7 @@
       	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
      -	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
      +	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
     -+	  [--partial[=<size>]|--filter=<filter>] [--] <repository>
     ++	  [--filter=<filter>] [--] <repository>
       	  [<directory>]
       
       DESCRIPTION
     @@ -35,17 +32,15 @@
       	of the repository. The sparse-checkout file can be
       	modified to grow the working directory as needed.
       
     -+--partial[=<size>]::
      +--filter=<filter-spec>::
      +	Use the partial clone feature and request that the server sends
      +	a subset of reachable objects according to a given object filter.
      +	When using `--filter`, the supplied `<filter-spec>` is used for
     -+	the partial clone filter. When using `--partial` with no `<size>`,
     -+	the `blob:none` filter is applied to filter all blobs. When using
     -+	`--partial=<size>` the `blob:limit=<size>` filter is applied to
     -+	filter all blobs with size larger than `<size>`. For more details
     -+	on filter specifications, see the `--filter` option in
     -+	linkgit:git-rev-list[1].
     ++	the partial clone filter. For example, `--filter=blob:none` will
     ++	filter out all blobs (file contents) until needed by Git. Also,
     ++	`--filter=blob:limit=<size>` will filter out all blobs of size
     ++	at least `<size>`. For more details on filter specifications, see
     ++	the `--filter` option in linkgit:git-rev-list[1].
      +
       --mirror::
       	Set up a mirror of the source repository.  This implies `--bare`.


 Documentation/git-clone.txt | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index bf24f1813ad..08d6045c4a8 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -15,7 +15,8 @@ SYNOPSIS
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
 	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
-	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
+	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
+	  [--filter=<filter>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -162,6 +163,16 @@ objects from the source repository into a pack in the cloned repository.
 	of the repository. The sparse-checkout file can be
 	modified to grow the working directory as needed.
 
+--filter=<filter-spec>::
+	Use the partial clone feature and request that the server sends
+	a subset of reachable objects according to a given object filter.
+	When using `--filter`, the supplied `<filter-spec>` is used for
+	the partial clone filter. For example, `--filter=blob:none` will
+	filter out all blobs (file contents) until needed by Git. Also,
+	`--filter=blob:limit=<size>` will filter out all blobs of size
+	at least `<size>`. For more details on filter specifications, see
+	the `--filter` option in linkgit:git-rev-list[1].
+
 --mirror::
 	Set up a mirror of the source repository.  This implies `--bare`.
 	Compared to `--bare`, `--mirror` not only maps local branches of the

base-commit: 6c85aac65fb455af85745130ce35ddae4678db84
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] clone: document --filter options
  2020-03-22 19:50 ` [PATCH v2] clone: document --filter options Derrick Stolee via GitGitGadget
@ 2020-03-24  3:40   ` Jeff King
  2020-03-24  5:17   ` Jonathan Nieder
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2020-03-24  3:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, jonathantanmy, christian.couder, git, Derrick Stolee

On Sun, Mar 22, 2020 at 07:50:06PM +0000, Derrick Stolee via GitGitGadget wrote:

>     Slightly simplify partial clone user experience
>     
>     V2: Only update the documentation of --filter.

Thanks, this part is definitely an improvement (and I read over the
proposed text again, and it looks very good).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] clone: document --filter options
  2020-03-22 19:50 ` [PATCH v2] clone: document --filter options Derrick Stolee via GitGitGadget
  2020-03-24  3:40   ` Jeff King
@ 2020-03-24  5:17   ` Jonathan Nieder
  1 sibling, 0 replies; 15+ messages in thread
From: Jonathan Nieder @ 2020-03-24  5:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, jonathantanmy, christian.couder, git, Derrick Stolee

Derrick Stolee wrote:

> It turns out that the "--filter=<filter-spec>" option is not
> documented anywhere in the "git clone" page, and instead is
> detailed carefully in "git rev-list" where it serves a
> different purpose.
>
> Add a small bit about this option in the documentation.
[...]
>  Documentation/git-clone.txt | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

>                                                         It
> would be worth some time to create a subsection in the "git clone"
> documentation about partial clone as a concept and how it can be
> a surprising experience. For example, "git checkout" will likely
> trigger a pack download.

I think that belongs in its own git-partial-clone(7) page.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-24  5:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
2020-03-19 17:28 ` [PATCH 1/2] partial-clone: set default filter with --partial Derrick Stolee via GitGitGadget
2020-03-20 20:26   ` Junio C Hamano
2020-03-20 20:38     ` Derrick Stolee
2020-03-22  9:46       ` Jeff King
2020-03-19 17:28 ` [PATCH 2/2] clone: document --partial and --filter options Derrick Stolee via GitGitGadget
2020-03-20 12:27 ` [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee
2020-03-22  9:51 ` Jeff King
2020-03-22 10:58   ` Christian Couder
2020-03-22 16:45     ` Derrick Stolee
2020-03-22 19:22       ` Jeff King
2020-03-22 16:03 ` Taylor Blau
2020-03-22 19:50 ` [PATCH v2] clone: document --filter options Derrick Stolee via GitGitGadget
2020-03-24  3:40   ` Jeff King
2020-03-24  5:17   ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).