git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, jrnieder@google.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 00/15] [RFC] Maintenance jobs and job runner
Date: Fri, 3 Apr 2020 20:16:21 -0400	[thread overview]
Message-ID: <cc9df614-2736-7cdd-006f-59878ee551c8@gmail.com> (raw)
In-Reply-To: <xmqqv9mgxn7u.fsf@gitster.c.googlers.com>

On 4/3/2020 5:40 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  * git run-job <job-name>: This builtin will run a single instance of a
>>    maintenance job.
>>    
>>  * git job-runner [--repo=<path>]: This builtin will run an infinite loop
>>    that executes git run-job as a subcommand.
> 
> What does this have to do with "git", though?  IOW, why does this
> have to be part of Git, so that those who would benefit from having
> a mechanism that makes it easy to run regular maintenance tasks but
> are not Git users (or those that want to do such maintenance tasks
> that are not necessarily tied to "git") must use "git" to do so?
> 
> I'll find out later why it is so after reading thru 15 patches
> myself, so no need to give a quick answer to the above; it was just
> my knee-jerk reaction.

That's a reasonable reaction. The short version of my reasoning is that
many many people _use_ Git but are not Git experts. While a Git expert
could find the right set of commands to run and at what frequency to
keep their repo clean, most users do not want to spend time learning
these commands. It's also worth our time as contributors to select what
a good set of non-intrusive maintenance tasks could be, and make them
easily accessible to users.

This series gets us half of the way there: a user interested in doing
background maintenance could figure out how to launch "git run-job" on
a schedule for their platform, or to launch "git job-runner" at start-
up. That's a lot simpler than learning how the commit-graph,
multi-pack-index, prune-packed, pack-objects, and fetch builtins work
with the complicated sets of arguments.

The second half would be to create a command such as

	git please-run-maintenance-on-this-repo

that initializes the background jobs and enables them on the repo they
are using. This allows the most casual of Git user to work efficiently
on very large repositories.

Sometimes it is hard to remember that people use Git because it is an
important tool for getting their work done. Time waiting for Git to do
a slow operation or being blocked on a triggered "git gc --auto" is
time they would rather be doing what they want to do. Background
maintenance is a way to reduce the time users spend blocked on Git and
increase their productivity on the more important things.

Of course, I'm biased to using very large repositories where the
existing maintenance process is insufficient. The design of these jobs
is taken directly from what we designed and built for VFS for Git and
Scalar over the winter of 2018-2019. These jobs were incredibly effective
in cleaning up repositories that were accumulating cruft for over a year
without any maintenance. Those repos have stayed clean and we haven't
found more maintenance tasks to be necesary.

I still believe that there are plenty of repos of similar size to the
Linux kernel that are in frequent use and could benefit from these
operations.

Thanks,
-Stolee

  reply	other threads:[~2020-04-04  0:16 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 20:47 Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 01/15] run-job: create barebones builtin Derrick Stolee via GitGitGadget
2020-04-05 15:10   ` Phillip Wood
2020-04-05 19:21     ` Junio C Hamano
2020-04-06 14:42       ` Derrick Stolee
2020-04-07  0:58         ` Danh Doan
2020-04-07 10:54           ` Derrick Stolee
2020-04-07 14:16             ` Danh Doan
2020-04-07 14:30               ` Johannes Schindelin
2020-04-03 20:48 ` [PATCH 02/15] run-job: implement commit-graph job Derrick Stolee via GitGitGadget
2020-05-20 19:08   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 03/15] run-job: implement fetch job Derrick Stolee via GitGitGadget
2020-04-05 15:14   ` Phillip Wood
2020-04-06 12:48     ` Derrick Stolee
2020-04-05 20:28   ` Junio C Hamano
2020-04-06 12:46     ` Derrick Stolee
2020-05-20 19:08   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 04/15] run-job: implement loose-objects job Derrick Stolee via GitGitGadget
2020-04-05 20:33   ` Junio C Hamano
2020-04-03 20:48 ` [PATCH 05/15] run-job: implement pack-files job Derrick Stolee via GitGitGadget
2020-05-27 22:17   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 06/15] run-job: auto-size or use custom pack-files batch Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 07/15] config: add job.pack-files.batchSize option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 08/15] job-runner: create builtin for job loop Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 09/15] job-runner: load repos from config by default Derrick Stolee via GitGitGadget
2020-04-05 15:18   ` Phillip Wood
2020-04-06 12:49     ` Derrick Stolee
2020-04-05 15:41   ` Phillip Wood
2020-04-06 12:57     ` Derrick Stolee
2020-04-03 20:48 ` [PATCH 10/15] job-runner: use config to limit job frequency Derrick Stolee via GitGitGadget
2020-04-05 15:24   ` Phillip Wood
2020-04-03 20:48 ` [PATCH 11/15] job-runner: use config for loop interval Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 12/15] job-runner: add --interval=<span> option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 13/15] job-runner: skip a job if job.<job-name>.enabled is false Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 14/15] job-runner: add --daemonize option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 15/15] runjob: customize the loose-objects batch size Derrick Stolee via GitGitGadget
2020-04-03 21:40 ` [PATCH 00/15] [RFC] Maintenance jobs and job runner Junio C Hamano
2020-04-04  0:16   ` Derrick Stolee [this message]
2020-04-07  0:50     ` Danh Doan
2020-04-07 10:59       ` Derrick Stolee
2020-04-07 14:26         ` Danh Doan
2020-04-07 14:43           ` Johannes Schindelin
2020-04-07  1:48     ` brian m. carlson
2020-04-07 20:08       ` Junio C Hamano
2020-04-07 22:23       ` Johannes Schindelin
2020-04-08  0:01         ` brian m. carlson
2020-05-27 22:39           ` Josh Steadmon
2020-05-28  0:47             ` Junio C Hamano
2020-05-27 21:52               ` Johannes Schindelin
2020-05-28 14:48                 ` Junio C Hamano
2020-05-28 14:50                 ` Jonathan Nieder
2020-05-28 14:57                   ` Junio C Hamano
2020-05-28 15:03                     ` Jonathan Nieder
2020-05-28 15:30                       ` Derrick Stolee
2020-05-28  4:39                         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc9df614-2736-7cdd-006f-59878ee551c8@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --subject='Re: [PATCH 00/15] [RFC] Maintenance jobs and job runner' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).