From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, sandals@crustytoothpaste.net, steadmon@google.com, jrnieder@gmail.com, peff@peff.net, congdanhqx@gmail.com, phillip.wood123@gmail.com, Derrick Stolee <derrickstolee@github.com> Subject: [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Date: Tue, 07 Jul 2020 14:21:14 +0000 [thread overview] Message-ID: <pull.671.git.1594131695.gitgitgadget@gmail.com> (raw) This is a second attempt at redesigning Git's repository maintenance patterns. The first attempt [1] included a way to run jobs in the background using a long-lived process; that idea was rejected and is not included in this series. A future series will use the OS to handle scheduling tasks. [1] https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/ As mentioned before, git gc already plays the role of maintaining Git repositories. It has accumulated several smaller pieces in its long history, including: 1. Repacking all reachable objects into one pack-file (and deleting unreachable objects). 2. Packing refs. 3. Expiring reflogs. 4. Clearing rerere logs. 5. Updating the commit-graph file. While expiring reflogs, clearing rererelogs, and deleting unreachable objects are suitable under the guise of "garbage collection", packing refs and updating the commit-graph file are not as obviously fitting. Further, these operations are "all or nothing" in that they rewrite almost all repository data, which does not perform well at extremely large scales. These operations can also be disruptive to foreground Git commands when git gc --auto triggers during routine use. This series does not intend to change what git gc does, but instead create new choices for automatic maintenance activities, of which git gc remains the only one enabled by default. The new maintenance tasks are: * 'commit-graph' : write and verify a single layer of an incremental commit-graph. * 'loose-objects' : prune packed loose objects, then create a new pack from a batch of loose objects. * 'pack-files' : expire redundant packs from the multi-pack-index, then repack using the multi-pack-index's incremental repack strategy. * 'fetch' : fetch from each remote, storing the refs in 'refs/hidden//'. These tasks are all disabled by default, but can be enabled with config options or run explicitly using "git maintenance run --task=". There are additional config options to allow customizing the conditions for which the tasks run during the '--auto' option. ('fetch' will never run with the '--auto' option.) Because 'gc' is implemented as a maintenance task, the most dramatic change of this series is to convert the 'git gc --auto' calls into 'git maintenance run --auto' calls at the end of some Git commands. By default, the only change is that 'git gc --auto' will be run below an additional 'git maintenance' process. The 'git maintenance' builtin has a 'run' subcommand so it can be extended later with subcommands that manage background maintenance, such as 'start', 'stop', 'pause', or 'schedule'. These are not the subject of this series, as it is important to focus on the maintenance activities themselves. An expert user could set up scheduled background maintenance themselves with the current series. I have the following crontab data set up to run maintenance on an hourly basis: 0 * * * * git -C /<path-to-repo> maintenance run --no-quiet >>/<path-to-repo>/.git/maintenance.log My config includes all tasks except the 'gc' task. The hourly run is over-aggressive, but is sufficient for testing. I'll replace it with daily when I feel satisfied. Hopefully this direction is seen as a positive one. My goal was to add more options for expert users, along with the flexibility to create background maintenance via the OS in a later series. OUTLINE ======= Patches 1-4 remove some references to the_repository in builtin/gc.c before we start depending on code in that builtin. Patches 5-7 create the 'git maintenance run' builtin and subcommand as a simple shim over 'git gc' and replaces calls to 'git gc --auto' from other commands. Patches 8-15 create new maintenance tasks. These are the same tasks sent in the previous RFC. Patches 16-21 create more customization through config and perform other polish items. FUTURE WORK =========== * Add 'start', 'stop', and 'schedule' subcommands to initialize the commands run in the background. * Split the 'gc' builtin into smaller maintenance tasks that are enabled by default, but might have different '--auto' conditions and more config options. * Replace config like 'gc.writeCommitGraph' and 'fetch.writeCommitGraph' with use of the 'commit-graph' task. Thanks, -Stolee Derrick Stolee (21): gc: use the_repository less often gc: use repository in too_many_loose_objects() gc: use repo config gc: drop the_repository in log location maintenance: create basic maintenance runner maintenance: add --quiet option maintenance: replace run_auto_gc() maintenance: initialize task array and hashmap maintenance: add commit-graph task maintenance: add --task option maintenance: take a lock on the objects directory maintenance: add fetch task maintenance: add loose-objects task maintenance: add pack-files task maintenance: auto-size pack-files batch maintenance: create maintenance.<task>.enabled config maintenance: use pointers to check --auto maintenance: add auto condition for commit-graph task maintenance: create auto condition for loose-objects maintenance: add pack-files auto condition midx: use start_delayed_progress() .gitignore | 1 + Documentation/config.txt | 2 + Documentation/config/maintenance.txt | 32 + Documentation/fetch-options.txt | 5 +- Documentation/git-clone.txt | 7 +- Documentation/git-maintenance.txt | 124 ++++ builtin.h | 1 + builtin/am.c | 2 +- builtin/commit.c | 2 +- builtin/fetch.c | 6 +- builtin/gc.c | 881 +++++++++++++++++++++++++-- builtin/merge.c | 2 +- builtin/rebase.c | 4 +- commit-graph.c | 8 +- commit-graph.h | 1 + config.c | 24 +- config.h | 2 + git.c | 1 + midx.c | 12 +- midx.h | 1 + object.h | 1 + run-command.c | 7 +- run-command.h | 2 +- t/t5319-multi-pack-index.sh | 14 +- t/t5510-fetch.sh | 2 +- t/t5514-fetch-multiple.sh | 2 +- t/t7900-maintenance.sh | 211 +++++++ 27 files changed, 1265 insertions(+), 92 deletions(-) create mode 100644 Documentation/config/maintenance.txt create mode 100644 Documentation/git-maintenance.txt create mode 100755 t/t7900-maintenance.sh base-commit: 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-671%2Fderrickstolee%2Fmaintenance%2Fgc-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-671/derrickstolee/maintenance/gc-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/671 -- gitgitgadget
next reply other threads:[~2020-07-07 14:21 UTC|newest] Thread overview: 166+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-07 14:21 Derrick Stolee via GitGitGadget [this message] 2020-07-07 14:21 ` [PATCH 01/21] gc: use the_repository less often Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 02/21] gc: use repository in too_many_loose_objects() Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 03/21] gc: use repo config Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 04/21] gc: drop the_repository in log location Derrick Stolee via GitGitGadget 2020-07-09 2:22 ` Jonathan Tan 2020-07-09 11:13 ` Derrick Stolee 2020-07-07 14:21 ` [PATCH 05/21] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 06/21] maintenance: add --quiet option Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 07/21] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 08/21] maintenance: initialize task array and hashmap Derrick Stolee via GitGitGadget 2020-07-09 2:25 ` Jonathan Tan 2020-07-09 13:15 ` Derrick Stolee 2020-07-09 13:51 ` Junio C Hamano 2020-07-07 14:21 ` [PATCH 09/21] maintenance: add commit-graph task Derrick Stolee via GitGitGadget 2020-07-09 2:29 ` Jonathan Tan 2020-07-09 11:14 ` Derrick Stolee 2020-07-09 22:52 ` Jeff King 2020-07-09 23:41 ` Derrick Stolee 2020-07-07 14:21 ` [PATCH 10/21] maintenance: add --task option Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 11/21] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 12/21] maintenance: add fetch task Derrick Stolee via GitGitGadget 2020-07-09 2:35 ` Jonathan Tan 2020-07-07 14:21 ` [PATCH 13/21] maintenance: add loose-objects task Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 14/21] maintenance: add pack-files task Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 15/21] maintenance: auto-size pack-files batch Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 16/21] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 17/21] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 18/21] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 19/21] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 20/21] maintenance: add pack-files auto condition Derrick Stolee via GitGitGadget 2020-07-07 14:21 ` [PATCH 21/21] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget 2020-07-08 23:57 ` [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Emily Shaffer 2020-07-09 11:21 ` Derrick Stolee 2020-07-09 12:43 ` Derrick Stolee 2020-07-09 23:16 ` Jeff King 2020-07-09 23:45 ` Derrick Stolee 2020-07-10 18:46 ` Emily Shaffer 2020-07-10 19:30 ` Son Luong Ngoc 2020-07-09 14:05 ` Junio C Hamano 2020-07-09 15:54 ` Derrick Stolee 2020-07-09 16:26 ` Junio C Hamano 2020-07-09 16:56 ` Derrick Stolee 2020-07-23 17:56 ` [PATCH v2 00/18] " Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 01/18] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget 2020-07-25 1:26 ` Taylor Blau 2020-07-25 1:47 ` Đoàn Trần Công Danh 2020-07-29 22:19 ` Jonathan Nieder 2020-07-30 13:12 ` Derrick Stolee 2020-07-31 0:30 ` Jonathan Nieder 2020-08-03 17:37 ` Derrick Stolee 2020-08-03 17:46 ` Jonathan Nieder 2020-08-03 22:46 ` Taylor Blau 2020-08-03 23:01 ` Jonathan Nieder 2020-08-03 23:08 ` Taylor Blau 2020-08-03 23:17 ` Jonathan Nieder 2020-08-04 0:07 ` Junio C Hamano 2020-08-04 13:32 ` Derrick Stolee 2020-08-04 14:42 ` Jonathan Nieder 2020-08-04 16:32 ` Derrick Stolee 2020-08-04 17:02 ` Jonathan Nieder 2020-08-04 17:51 ` Derrick Stolee 2020-08-05 15:02 ` Derrick Stolee 2020-07-31 16:40 ` Jonathan Nieder 2020-08-03 23:52 ` Jonathan Nieder 2020-07-23 17:56 ` [PATCH v2 02/18] maintenance: add --quiet option Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 03/18] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget 2020-07-23 20:21 ` Junio C Hamano 2020-07-25 1:33 ` Taylor Blau 2020-07-30 13:29 ` Derrick Stolee 2020-07-30 13:31 ` Derrick Stolee 2020-07-30 19:00 ` Eric Sunshine 2020-07-30 20:21 ` Derrick Stolee 2020-07-23 17:56 ` [PATCH v2 04/18] maintenance: initialize task array Derrick Stolee via GitGitGadget 2020-07-23 19:57 ` Junio C Hamano 2020-07-24 12:23 ` Derrick Stolee 2020-07-24 12:51 ` Derrick Stolee 2020-07-24 19:39 ` Junio C Hamano 2020-07-25 1:46 ` Taylor Blau 2020-07-29 22:19 ` Emily Shaffer 2020-07-23 17:56 ` [PATCH v2 05/18] maintenance: add commit-graph task Derrick Stolee via GitGitGadget 2020-07-23 20:22 ` Junio C Hamano 2020-07-24 13:09 ` Derrick Stolee 2020-07-24 19:47 ` Junio C Hamano 2020-07-25 1:52 ` Taylor Blau 2020-07-30 13:59 ` Derrick Stolee 2020-07-29 0:22 ` Jeff King 2020-07-23 17:56 ` [PATCH v2 06/18] maintenance: add --task option Derrick Stolee via GitGitGadget 2020-07-23 20:21 ` Junio C Hamano 2020-07-23 22:18 ` Junio C Hamano 2020-07-24 13:36 ` Derrick Stolee 2020-07-24 19:50 ` Junio C Hamano 2020-07-23 17:56 ` [PATCH v2 07/18] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 08/18] maintenance: add prefetch task Derrick Stolee via GitGitGadget 2020-07-23 20:53 ` Junio C Hamano 2020-07-24 14:25 ` Derrick Stolee 2020-07-24 20:47 ` Junio C Hamano 2020-07-25 1:37 ` Đoàn Trần Công Danh 2020-07-25 1:48 ` Junio C Hamano 2020-07-27 14:07 ` Derrick Stolee 2020-07-27 16:13 ` Junio C Hamano 2020-07-27 18:27 ` Derrick Stolee 2020-07-28 16:37 ` [PATCH v2] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano 2020-07-29 9:12 ` Phillip Wood 2020-07-29 9:17 ` Phillip Wood 2020-07-30 15:17 ` Derrick Stolee 2020-07-23 17:56 ` [PATCH v2 09/18] maintenance: add loose-objects task Derrick Stolee via GitGitGadget 2020-07-23 20:59 ` Junio C Hamano 2020-07-24 14:50 ` Derrick Stolee 2020-07-24 19:57 ` Junio C Hamano 2020-07-29 22:21 ` Emily Shaffer 2020-07-30 15:38 ` Derrick Stolee 2020-07-23 17:56 ` [PATCH v2 10/18] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget 2020-07-23 22:00 ` Junio C Hamano 2020-07-24 15:03 ` Derrick Stolee 2020-07-29 22:22 ` Emily Shaffer 2020-07-23 17:56 ` [PATCH v2 11/18] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget 2020-07-23 22:15 ` Junio C Hamano 2020-07-23 23:09 ` Eric Sunshine 2020-07-23 23:24 ` Junio C Hamano 2020-07-24 16:09 ` Derrick Stolee 2020-07-24 19:51 ` Derrick Stolee 2020-07-24 20:17 ` Junio C Hamano 2020-07-29 22:23 ` Emily Shaffer 2020-07-30 16:57 ` Derrick Stolee 2020-07-30 19:02 ` Derrick Stolee 2020-07-30 19:24 ` Chris Torek 2020-08-05 12:37 ` Đoàn Trần Công Danh 2020-08-06 13:54 ` Derrick Stolee 2020-07-23 17:56 ` [PATCH v2 12/18] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 13/18] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 14/18] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 15/18] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 16/18] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 17/18] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget 2020-07-23 17:56 ` [PATCH v2 18/18] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget 2020-07-29 22:03 ` [PATCH v2 00/18] Maintenance builtin, allowing 'gc --auto' customization Emily Shaffer 2020-07-30 22:24 ` [PATCH v3 00/20] " Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 01/20] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 02/20] maintenance: add --quiet option Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 03/20] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 04/20] maintenance: initialize task array Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 05/20] maintenance: add commit-graph task Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 06/20] maintenance: add --task option Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 07/20] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 08/20] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 09/20] maintenance: add prefetch task Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 10/20] maintenance: add loose-objects task Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 11/20] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 12/20] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 13/20] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget 2020-07-30 23:36 ` Chris Torek 2020-08-03 17:43 ` Derrick Stolee 2020-07-30 22:24 ` [PATCH v3 14/20] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 15/20] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 16/20] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 17/20] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 18/20] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 19/20] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget 2020-07-30 22:24 ` [PATCH v3 20/20] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget 2020-07-30 23:06 ` [PATCH v3 00/20] Maintenance builtin, allowing 'gc --auto' customization Junio C Hamano 2020-07-30 23:31 ` Junio C Hamano 2020-07-31 2:58 ` Junio C Hamano 2020-08-06 17:58 ` Derrick Stolee 2020-07-13 6:18 [PATCH 00/21] " Son Luong Ngoc 2020-07-14 13:46 ` Derrick Stolee
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=pull.671.git.1594131695.gitgitgadget@gmail.com \ --to=gitgitgadget@gmail.com \ --cc=Johannes.Schindelin@gmx.de \ --cc=congdanhqx@gmail.com \ --cc=derrickstolee@github.com \ --cc=git@vger.kernel.org \ --cc=jrnieder@gmail.com \ --cc=peff@peff.net \ --cc=phillip.wood123@gmail.com \ --cc=sandals@crustytoothpaste.net \ --cc=steadmon@google.com \ --subject='Re: [PATCH 00/21] Maintenance builtin, allowing '\''gc --auto'\'' customization' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).