git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: sandals@crustytoothpaste.net, steadmon@google.com,
	jrnieder@gmail.com, peff@peff.net, congdanhqx@gmail.com,
	phillip.wood123@gmail.com, emilyshaffer@google.com,
	sluongng@gmail.com, jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks
Date: Tue, 18 Aug 2020 14:22:57 +0000	[thread overview]
Message-ID: <pull.695.v2.git.1597760589.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.695.git.1596728921.gitgitgadget@gmail.com>

This series is based on jk/strvec.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  14 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 ++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 354 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  63 +++++
 t/test-lib-functions.sh              |  33 +++
 23 files changed, 580 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: d70a9eb611a9d242c1d26847d223b8677609305b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v1:

  1:  2b9deb6d6a !  1:  e09e4a4a87 maintenance: create basic maintenance runner
     @@ Documentation/git-maintenance.txt (new)
      +-----------
      +Run tasks to optimize Git repository data, speeding up other Git commands
      +and reducing storage requirements for the repository.
     -++
     ++
      +Git commands that add repository data, such as `git add` or `git fetch`,
      +are optimized for a responsive user experience. These commands do not take
      +time to optimize the Git data, since such optimizations scale with the full
      +size of the repository while these user commands each perform a relatively
      +small action.
     -++
     ++
      +The `git maintenance` command provides flexibility for how to optimize the
      +Git repository.
      +
     @@ Documentation/git-maintenance.txt (new)
      +-----
      +
      +gc::
     -+	Cleanup unnecessary files and optimize the local repository. "GC"
     ++	Clean up unnecessary files and optimize the local repository. "GC"
      +	stands for "garbage collection," but this task performs many
     -+	smaller tasks. This task can be rather expensive for large
     -+	repositories, as it repacks all Git objects into a single pack-file.
     -+	It can also be disruptive in some situations, as it deletes stale
     -+	data.
     ++	smaller tasks. This task can be expensive for large repositories,
     ++	as it repacks all Git objects into a single pack-file. It can also
     ++	be disruptive in some situations, as it deletes stale data. See
     ++	linkgit:git-gc[1] for more details on garbage collection in Git.
      +
      +OPTIONS
      +-------
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +
      +int cmd_maintenance(int argc, const char **argv, const char *prefix)
      +{
     -+	static struct maintenance_opts opts;
     -+	static struct option builtin_maintenance_options[] = {
     ++	struct maintenance_opts opts;
     ++	struct option builtin_maintenance_options[] = {
      +		OPT_BOOL(0, "auto", &opts.auto_flag,
      +			 N_("run tasks based on the state of the repository")),
      +		OPT_END()
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +			     builtin_maintenance_usage,
      +			     PARSE_OPT_KEEP_UNKNOWN);
      +
     -+	if (argc == 1) {
     -+		if (!strcmp(argv[0], "run"))
     -+			return maintenance_run(&opts);
     -+	}
     ++	if (argc != 1)
     ++		usage_with_options(builtin_maintenance_usage,
     ++				   builtin_maintenance_options);
     ++
     ++	if (!strcmp(argv[0], "run"))
     ++		return maintenance_run(&opts);
      +
     -+	usage_with_options(builtin_maintenance_usage,
     -+			   builtin_maintenance_options);
     ++	die(_("invalid subcommand: %s"), argv[0]);
      +}
      
       ## git.c ##
     @@ t/t7900-maintenance.sh (new)
      +
      +test_expect_success 'help text' '
      +	test_expect_code 129 git maintenance -h 2>err &&
     -+	test_i18ngrep "usage: git maintenance run" err
     ++	test_i18ngrep "usage: git maintenance run" err &&
     ++	test_expect_code 128 git maintenance barf 2>err &&
     ++	test_i18ngrep "invalid subcommand: barf" err
      +'
      +
      +test_expect_success 'run [--auto]' '
  2:  d5faef26af !  2:  adae48d235 maintenance: add --quiet option
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
       	close_object_store(the_repository->objects);
       	return run_command(&child);
      @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 	static struct option builtin_maintenance_options[] = {
     + 	struct option builtin_maintenance_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
      +		OPT_BOOL(0, "quiet", &opts.quiet,
     @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'help text' '
     - 	test_i18ngrep "usage: git maintenance run" err
     + 	test_i18ngrep "invalid subcommand: barf" err
       '
       
      -test_expect_success 'run [--auto]' '
  3:  233811310b =  3:  91741a0cfc maintenance: replace run_auto_gc()
  4:  7efa23abc8 =  4:  1db3b96280 maintenance: initialize task array
  5:  902b742032 !  5:  50b457fd57 maintenance: add commit-graph task
     @@ Documentation/git-maintenance.txt: run::
      +	issue, then the chain file is removed and the `commit-graph` is
      +	rewritten from scratch.
      ++
     -+The verification only checks the top layer of the `commit-graph` chain.
     -+If the incremental write merged the new commits with at least one
     -+existing layer, then there is potential for on-disk corruption being
     -+carried forward into the new file. This will be noticed and the new
     -+commit-graph file will be clean as Git reparses the commit data from
     -+the object database.
     -++
      +The incremental write is safe to run alongside concurrent Git processes
      +since it will not expire `.graph` files that were in the previous
      +`commit-graph-chain` file. They will be deleted by a later run based on
      +the expiration delay.
      +
       gc::
     - 	Cleanup unnecessary files and optimize the local repository. "GC"
     + 	Clean up unnecessary files and optimize the local repository. "GC"
       	stands for "garbage collection," but this task performs many
      
       ## builtin/gc.c ##
     @@ t/t7900-maintenance.sh: test_description='git maintenance builtin'
      +
       test_expect_success 'help text' '
       	test_expect_code 129 git maintenance -h 2>err &&
     - 	test_i18ngrep "usage: git maintenance run" err
     + 	test_i18ngrep "usage: git maintenance run" err &&
  6:  dddbcc4f3d !  6:  85268bd53e maintenance: add --task option
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +
       int cmd_maintenance(int argc, const char **argv, const char *prefix)
       {
     - 	static struct maintenance_opts opts;
     + 	struct maintenance_opts opts;
      @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
       			 N_("run tasks based on the state of the repository")),
       		OPT_BOOL(0, "quiet", &opts.quiet,
  7:  79af39be13 =  7:  6f86cfaa94 maintenance: take a lock on the objects directory
  8:  69bfc6a4b2 =  8:  5c0f9d69d1 maintenance: create maintenance.<task>.enabled config
  9:  df21bbb000 =  9:  68bf5bef4b maintenance: use pointers to check --auto
 10:  e67e259aef = 10:  fc097c389a maintenance: add auto condition for commit-graph task
 11:  a5d1914846 = 11:  46fbe161aa maintenance: add trace2 regions for task execution

-- 
gitgitgadget

  parent reply	other threads:[~2020-08-18 14:23 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-07 22:16   ` Martin Ågren
2020-08-12 21:03   ` Jonathan Nieder
2020-08-12 22:07     ` Junio C Hamano
2020-08-12 22:50       ` Jonathan Nieder
2020-08-14  1:05     ` Derrick Stolee
2020-08-06 15:48 ` [PATCH 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-07 22:29   ` Martin Ågren
2020-08-12 13:30     ` Derrick Stolee
2020-08-14 12:23       ` Martin Ågren
2020-08-06 15:48 ` [PATCH 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-08-18 14:22 ` Derrick Stolee via GitGitGadget [this message]
2020-08-18 14:22   ` [PATCH v2 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-18 14:22   ` [PATCH v2 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-18 23:46     ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-18 23:51     ` Jonathan Tan
2020-08-19 15:04       ` Derrick Stolee
2020-08-19 17:43         ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-19  0:00     ` Jonathan Tan
2020-08-19  0:36       ` Junio C Hamano
2020-08-19 15:09         ` Derrick Stolee
2020-08-19 17:35           ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-19  0:04     ` Jonathan Tan
2020-08-19 15:10       ` Derrick Stolee
2020-08-18 14:23   ` [PATCH v2 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-19  0:09     ` Jonathan Tan
2020-08-19 15:15       ` Derrick Stolee
2020-08-18 14:23   ` [PATCH v2 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-08-19  0:11     ` Jonathan Tan
2020-08-18 20:18   ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
2020-08-19 14:51     ` Derrick Stolee
2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-26 23:02       ` Jonathan Tan
2020-08-25 18:33     ` [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-26 23:02       ` Jonathan Tan
2020-08-26 23:56         ` Junio C Hamano
2020-08-25 18:33     ` [PATCH v3 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-09-21 13:36           ` Ævar Arnfjörð Bjarmason
2020-09-21 13:43             ` Derrick Stolee
2020-09-21 19:29               ` Junio C Hamano
2020-09-17 18:11         ` [PATCH v5 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-17 18:35         ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
2020-09-18 13:14           ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.695.v2.git.1597760589.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).