git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: tom.saeger@oracle.com, gitster@pobox.com,
	sunshine@sunshineco.com, Derrick Stolee <stolee@gmail.com>,
	Josh Steadmon <steadmon@google.com>,
	Emily Shaffer <emilyshaffer@google.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v3 0/3] Maintenance: adapt custom refspecs
Date: Sat, 10 Apr 2021 02:03:42 +0000	[thread overview]
Message-ID: <pull.924.v3.git.1618020225.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.924.v2.git.1617734870.gitgitgadget@gmail.com>

Tom Saeger rightly pointed out [1] that the prefetch task ignores custom
refspecs. This can lead to downloading more data than requested, and it
doesn't even help the future foreground fetches that use that custom
refspec.

[1]
https://lore.kernel.org/git/20210401184914.qmr7jhjbhp2mt3h6@dhcp-10-154-148-175.vpn.oracle.com/

This series fixes this problem by carefully replacing the start of each
refspec's destination with "refs/prefetch/". If the destination already
starts with "refs/", then that is replaced. Otherwise "refs/prefetch/" is
just prepended.

This happens inside of git fetch when a --prefetch option is given. This
allows us to maniuplate a struct refspec_item instead of a full refspec
string. It also simplifies our logic in testing the prefetch task.


Update in V3
============

 * The fix is almost completely rewritten as an update to 'git fetch'. See
   the new PATCH 2 for this update.

 * There was some discussion of rewriting test_subcommand, but that can be
   delayed until a proper solution is found to complications around softer
   matches.


Updates in V2
=============

Thanks for the close eye on this series. I appreciate the recommendations,
which I believe I have responded to them all:

 * Fixed typos.
 * Made refspec_item_format() re-entrant. Consumers must free the buffer.
 * Cleaned up style (quoting and tabbing).

Thanks, -Stolee

Derrick Stolee (3):
  maintenance: simplify prefetch logic
  fetch: add --prefetch option
  maintenance: use 'git fetch --prefetch'

 Documentation/fetch-options.txt   |  5 +++
 Documentation/git-maintenance.txt |  6 ++--
 builtin/fetch.c                   | 56 +++++++++++++++++++++++++++++++
 builtin/gc.c                      | 36 +++++---------------
 t/t5582-fetch-negative-refspec.sh | 30 +++++++++++++++++
 t/t7900-maintenance.sh            | 14 ++++----
 6 files changed, 109 insertions(+), 38 deletions(-)


base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-924%2Fderrickstolee%2Fmaintenance%2Frefspec-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-924/derrickstolee/maintenance/refspec-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/924

Range-diff vs v2:

 1:  5aa0cb06c3f2 = 1:  4c0e983ba56f maintenance: simplify prefetch logic
 2:  d58a3e042ee8 < -:  ------------ test-lib: use exact match for test_subcommand
 3:  96388d949b98 < -:  ------------ refspec: output a refspec item
 4:  bf296282323a < -:  ------------ test-tool: test refspec input/output
 -:  ------------ > 2:  7f488eea6dbd fetch: add --prefetch option
 5:  9592224e3d42 ! 3:  ed055d772452 maintenance: allow custom refspecs during prefetch
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    maintenance: allow custom refspecs during prefetch
     +    maintenance: use 'git fetch --prefetch'
      
     -    The prefetch task previously used the default refspec source plus a
     -    custom refspec destination to avoid colliding with remote refs:
     +    The 'prefetch' maintenance task previously forced the following refspec
     +    for each remote:
      
                  +refs/heads/*:refs/prefetch/<remote>/*
      
     -    However, some users customize their refspec to reduce how much data they
     -    download from specific remotes. This can involve restrictive patterns
     -    for fetching or negative patterns to avoid downloading some refs.
     +    If a user has specified a more strict refspec for the remote, then this
     +    prefetch task downloads more objects than necessary.
      
     -    Modify fetch_remote() to iterate over the remote's refspec list and
     -    translate that into the appropriate prefetch scenario. Specifically,
     -    re-parse the raw form of the refspec into a new 'struct refspec' and
     -    modify the 'dst' member to replace a leading "refs/" substring with
     -    "refs/prefetch/", or prepend "refs/prefetch/" to 'dst' otherwise.
     -    Negative refspecs do not have a 'dst' so they can be transferred to the
     -    'git fetch' command unmodified.
     -
     -    This prefix change provides the benefit of keeping whatever collisions
     -    may exist in the custom refspecs, if that is a desirable outcome.
     -
     -    This changes the names of the refs that would be fetched by the default
     -    refspec. Instead of "refs/prefetch/<remote>/<branch>" they will now go
     -    to "refs/prefetch/remotes/<remote>/<branch>". While this is a change, it
     -    is not a seriously breaking one: these refs are intended to be hidden
     -    and not used.
     +    The previous change introduced the '--prefetch' option to 'git fetch'
     +    which manipulates the remote's refspec to place all resulting refs into
     +    refs/prefetch/, with further partitioning based on the destinations of
     +    those refspecs.
      
          Update the documentation to be more generic about the destination refs.
     -    Do not mention custom refpecs explicitly, as that does not need to be
     +    Do not mention custom refspecs explicitly, as that does not need to be
          highlighted in this documentation. The important part of placing refs in
     -    refs/prefetch remains.
     +    refs/prefetch/ remains.
      
          Reported-by: Tom Saeger <tom.saeger@oracle.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: prefetch::
     +@@ Documentation/git-maintenance.txt: commit-graph::
     + prefetch::
     + 	The `prefetch` task updates the object directory with the latest
       	objects from all registered remotes. For each remote, a `git fetch`
     - 	command is run. The refmap is custom to avoid updating local or remote
     - 	branches (those in `refs/heads` or `refs/remotes`). Instead, the
     +-	command is run. The refmap is custom to avoid updating local or remote
     +-	branches (those in `refs/heads` or `refs/remotes`). Instead, the
      -	remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
      -	not updated.
     -+	refs are stored in `refs/prefetch/`. Also, tags are not updated.
     ++	command is run. The configured refspec is modified to place all
     ++	requested refs within `refs/prefetch/`. Also, tags are not updated.
       +
       This is done to avoid disrupting the remote-tracking branches. The end users
       expect these refs to stay unmoved unless they initiate a fetch.  With prefetch
      
       ## builtin/gc.c ##
     -@@
     - #include "remote.h"
     - #include "object-store.h"
     - #include "exec-cmd.h"
     -+#include "refspec.h"
     - 
     - #define FAILED_RUN "failed to run %s"
     - 
      @@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     - {
     - 	struct maintenance_run_opts *opts = cbdata;
       	struct child_process child = CHILD_PROCESS_INIT;
     -+	int i;
       
       	child.git_cmd = 1;
     - 	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
     -@@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     +-	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
     ++	strvec_pushl(&child.args, "fetch", remote->name,
     ++		     "--prefetch", "--prune", "--no-tags",
     + 		     "--no-write-fetch-head", "--recurse-submodules=no",
     +-		     "--refmap=", NULL);
     ++		     NULL);
     + 
       	if (opts->quiet)
       		strvec_push(&child.args, "--quiet");
       
      -	strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote->name);
     -+	for (i = 0; i < remote->fetch.nr; i++) {
     -+		struct refspec_item replace;
     -+		struct refspec_item *rsi = &remote->fetch.items[i];
     -+		struct strbuf new_dst = STRBUF_INIT;
     -+		size_t ignore_len = 0;
     -+		char *replace_string;
     -+
     -+		if (rsi->negative) {
     -+			strvec_push(&child.args, remote->fetch.raw[i]);
     -+			continue;
     -+		}
     -+
     -+		refspec_item_init(&replace, remote->fetch.raw[i], 1);
     -+
     -+		/*
     -+		 * If a refspec dst starts with "refs/" at the start,
     -+		 * then we will replace "refs/" with "refs/prefetch/".
     -+		 * Otherwise, we will prepend the dst string with
     -+		 * "refs/prefetch/".
     -+		 */
     -+		if (!strncmp(replace.dst, "refs/", 5))
     -+			ignore_len = 5;
     -+
     -+		strbuf_addstr(&new_dst, "refs/prefetch/");
     -+		strbuf_addstr(&new_dst, replace.dst + ignore_len);
     -+		free(replace.dst);
     -+		replace.dst = strbuf_detach(&new_dst, NULL);
     -+
     -+		replace_string = refspec_item_format(&replace);
     -+		strvec_push(&child.args, replace_string);
     -+		free(replace_string);
     -+
     -+		refspec_item_clear(&replace);
     -+	}
     - 
     +-
       	return !!run_command(&child);
       }
     + 
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'prefetch multiple remotes' '
     + 	test_commit -C clone1 one &&
       	test_commit -C clone2 two &&
       	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
     - 	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     --	test_subcommand git fetch remote1 $fetchargs "+refs/heads/*:refs/prefetch/remote1/*" <run-prefetch.txt &&
     --	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remote2/*" <run-prefetch.txt &&
     -+	test_subcommand git fetch remote1 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote1/*" <run-prefetch.txt &&
     -+	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote2/*" <run-prefetch.txt &&
     +-	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     +-	test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
     +-	test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
     ++	fetchargs="--prefetch --prune --no-tags --no-write-fetch-head --recurse-submodules=no --quiet" &&
     ++	test_subcommand git fetch remote1 $fetchargs <run-prefetch.txt &&
     ++	test_subcommand git fetch remote2 $fetchargs <run-prefetch.txt &&
       	test_path_is_missing .git/refs/remotes &&
      -	git log prefetch/remote1/one &&
      -	git log prefetch/remote2/two &&
     @@ t/t7900-maintenance.sh: test_expect_success 'prefetch multiple remotes' '
       
       	test_cmp_config refs/prefetch/ log.excludedecoration &&
       	git log --oneline --decorate --all >log &&
     - 	! grep "prefetch" log
     - '
     - 
     -+test_expect_success 'prefetch custom refspecs' '
     -+	git -C clone1 branch -f special/fetched HEAD &&
     -+	git -C clone1 branch -f special/secret/not-fetched HEAD &&
     -+
     -+	# create multiple refspecs for remote1
     -+	git config --add remote.remote1.fetch "+refs/heads/special/fetched:refs/heads/fetched" &&
     -+	git config --add remote.remote1.fetch "^refs/heads/special/secret/not-fetched" &&
     -+
     -+	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
     -+
     -+	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     -+
     -+	# skips second refspec because it is not a pattern type
     -+	rs1="+refs/heads/*:refs/prefetch/remotes/remote1/*" &&
     -+	rs2="+refs/heads/special/fetched:refs/prefetch/heads/fetched" &&
     -+	rs3="^refs/heads/special/secret/not-fetched" &&
     -+
     -+	test_subcommand git fetch remote1 $fetchargs "$rs1" "$rs2" "$rs3" <prefetch-refspec.txt &&
     -+	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote2/*" <prefetch-refspec.txt &&
     -+
     -+	# first refspec is overridden by second
     -+	test_must_fail git rev-parse refs/prefetch/special/fetched &&
     -+	git rev-parse refs/prefetch/heads/fetched &&
     -+
     -+	# possible incorrect places for the non-fetched ref
     -+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/secret/not-fetched &&
     -+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/not-fetched &&
     -+	test_must_fail git rev-parse refs/heads/secret/not-fetched &&
     -+	test_must_fail git rev-parse refs/heads/not-fetched
     -+'
     -+
     - test_expect_success 'prefetch and existing log.excludeDecoration values' '
     - 	git config --unset-all log.excludeDecoration &&
     - 	git config log.excludeDecoration refs/remotes/remote1/ &&

-- 
gitgitgadget

  parent reply	other threads:[~2021-04-10  2:04 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 13:04 [PATCH 0/5] Maintenance: adapt custom refspecs Derrick Stolee via GitGitGadget
2021-04-05 13:04 ` [PATCH 1/5] maintenance: simplify prefetch logic Derrick Stolee via GitGitGadget
2021-04-05 17:01   ` Tom Saeger
2021-04-05 13:04 ` [PATCH 2/5] test-lib: use exact match for test_subcommand Derrick Stolee via GitGitGadget
2021-04-05 17:31   ` Eric Sunshine
2021-04-05 17:43     ` Junio C Hamano
2021-04-05 13:04 ` [PATCH 3/5] refspec: output a refspec item Derrick Stolee via GitGitGadget
2021-04-05 16:57   ` Tom Saeger
2021-04-05 17:40     ` Eric Sunshine
2021-04-05 17:44       ` Junio C Hamano
2021-04-06 11:21         ` Derrick Stolee
2021-04-06 15:23           ` Eric Sunshine
2021-04-06 16:51             ` Derrick Stolee
2021-04-07  8:46   ` Ævar Arnfjörð Bjarmason
2021-04-07 20:53     ` Derrick Stolee
2021-04-07 22:05       ` Ævar Arnfjörð Bjarmason
2021-04-07 22:49         ` Junio C Hamano
2021-04-07 23:01           ` Ævar Arnfjörð Bjarmason
2021-04-08  7:33             ` Junio C Hamano
2021-04-05 13:04 ` [PATCH 4/5] test-tool: test refspec input/output Derrick Stolee via GitGitGadget
2021-04-05 17:52   ` Eric Sunshine
2021-04-06 11:13     ` Derrick Stolee
2021-04-07  8:54   ` Ævar Arnfjörð Bjarmason
2021-04-05 13:04 ` [PATCH 5/5] maintenance: allow custom refspecs during prefetch Derrick Stolee via GitGitGadget
2021-04-05 17:16   ` Tom Saeger
2021-04-06 11:15     ` Derrick Stolee
2021-04-07  8:53   ` Ævar Arnfjörð Bjarmason
2021-04-07 10:26     ` Ævar Arnfjörð Bjarmason
2021-04-09 11:48       ` Derrick Stolee
2021-04-09 19:28         ` Ævar Arnfjörð Bjarmason
2021-04-10  0:56           ` Derrick Stolee
2021-04-10 11:37             ` Ævar Arnfjörð Bjarmason
2021-04-07 13:47   ` Ævar Arnfjörð Bjarmason
2021-04-06 18:47 ` [PATCH v2 0/5] Maintenance: adapt custom refspecs Derrick Stolee via GitGitGadget
2021-04-06 18:47   ` [PATCH v2 1/5] maintenance: simplify prefetch logic Derrick Stolee via GitGitGadget
2021-04-07 23:23     ` Emily Shaffer
2021-04-09 19:00       ` Derrick Stolee
2021-04-06 18:47   ` [PATCH v2 2/5] test-lib: use exact match for test_subcommand Derrick Stolee via GitGitGadget
2021-04-06 18:47   ` [PATCH v2 3/5] refspec: output a refspec item Derrick Stolee via GitGitGadget
2021-04-06 18:47   ` [PATCH v2 4/5] test-tool: test refspec input/output Derrick Stolee via GitGitGadget
2021-04-07 23:08     ` Josh Steadmon
2021-04-07 23:26     ` Emily Shaffer
2021-04-06 18:47   ` [PATCH v2 5/5] maintenance: allow custom refspecs during prefetch Derrick Stolee via GitGitGadget
2021-04-06 19:36     ` Tom Saeger
2021-04-06 19:45       ` Derrick Stolee
2021-04-07 23:09     ` Josh Steadmon
2021-04-07 23:37     ` Emily Shaffer
2021-04-08  0:23     ` Jonathan Tan
2021-04-10  2:03   ` Derrick Stolee via GitGitGadget [this message]
2021-04-10  2:03     ` [PATCH v3 1/3] maintenance: simplify prefetch logic Derrick Stolee via GitGitGadget
2021-04-12 20:13       ` Tom Saeger
2021-04-12 20:27         ` Derrick Stolee
2021-04-10  2:03     ` [PATCH v3 2/3] fetch: add --prefetch option Derrick Stolee via GitGitGadget
2021-04-11 21:09       ` Ramsay Jones
2021-04-12 20:23         ` Derrick Stolee
2021-04-10  2:03     ` [PATCH v3 3/3] maintenance: use 'git fetch --prefetch' Derrick Stolee via GitGitGadget
2021-04-11  1:35     ` [PATCH v3 0/3] Maintenance: adapt custom refspecs Junio C Hamano
2021-04-12 16:48       ` Tom Saeger
2021-04-12 17:24         ` Tom Saeger
2021-04-12 17:41           ` Tom Saeger
2021-04-12 20:25             ` Derrick Stolee
2021-04-16 12:49     ` [PATCH v4 0/4] " Derrick Stolee via GitGitGadget
2021-04-16 12:49       ` [PATCH v4 1/4] maintenance: simplify prefetch logic Derrick Stolee via GitGitGadget
2021-04-16 18:02         ` Tom Saeger
2021-04-16 12:49       ` [PATCH v4 2/4] fetch: add --prefetch option Derrick Stolee via GitGitGadget
2021-04-16 17:52         ` Tom Saeger
2021-04-16 18:26           ` Tom Saeger
2021-04-16 12:49       ` [PATCH v4 3/4] maintenance: use 'git fetch --prefetch' Derrick Stolee via GitGitGadget
2021-04-16 12:49       ` [PATCH v4 4/4] maintenance: respect remote.*.skipFetchAll Derrick Stolee via GitGitGadget
2021-04-16 13:54         ` Ævar Arnfjörð Bjarmason
2021-04-16 14:33           ` Tom Saeger
2021-04-16 18:31         ` Tom Saeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.924.v3.git.1618020225.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=steadmon@google.com \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=tom.saeger@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).