git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: peff@peff.net, jrnieder@google.com, stolee@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 00/15] [RFC] Maintenance jobs and job runner
Date: Fri, 03 Apr 2020 20:47:59 +0000	[thread overview]
Message-ID: <pull.597.git.1585946894.gitgitgadget@gmail.com> (raw)

Background maintenance. I've blogged about it [1]. I've talked about it [2].
I brought it up at the contributor summit [3]. I think it's important. It's
a critical feature in Scalar and VFS for Git. It's the main reason we
created the "scalar register" feature for existing repos (so the Scalar
background maintenance would apply to a "vanilla" Git repo).

[1] https://devblogs.microsoft.com/devops/introducing-scalar/[2] 
https://stolee.dev/docs/git-merge-2020.pdf[3] 
https://lore.kernel.org/git/35FDF767-CE0C-4C51-88A8-12965CD2D4FF@jramsay.com.au/

This RFC does the "maintenance" half of "background maintenance". It creates
two new builtins to assist users in scheduling their own background
maintenance:

 * git run-job <job-name>: This builtin will run a single instance of a
   maintenance job.
   
   
 * git job-runner [--repo=<path>]: This builtin will run an infinite loop
   that executes git run-job as a subcommand.
   
   

I believe that I could replace most maintenance steps in Scalar with the git
run-job command and all that we would lose is some logging. (The only one
that would remain is the one that sets recommended config settings, but that
could be changed to a one-time operation at service start-up.) I haven't
tested this yet, but I plan to before sending a non-RFC option.

Of course, just because we think these maintenance operations work for our
scenario does not mean that they immediately work as the "canonical"
maintenance activities. I attempt to demonstrate the value of these jobs as
they are implemented. My perspective is skewed towards very large
repositories (2000+ full-time contributors), but even medium-size
repositories (100-200 full-time contributors) can benefit from these jobs.

I've tried to split the series into logical commits. This includes each
specific maintenance job being completely described in its own commit (plus
maybe an extra one to allow a custom option).

Most commit messages have "RFC QUESTION(S)" for things I was unsure about.

I'm currently testing this locally by setting job.repo in my global config
to be a few important repos on my Linux VM then running 
GIT_TRACE2_PERF=<path> git job-runner --daemonize to launch a background
process that logs the subcommands to <path>.

Here I will call out things that could definitely be improved:

 1. The git job-runner process is not tested AT ALL. It's a bit tricky to
    test it since it shouldn't exit when things work as expected. I expect
    to create a --no-loop option to stop after one iteration of the job
    loop. But I would like a bit more feedback on the concept before jumping
    into those tests. (I do test the git run-job builtin for each job, but
    not the customization through arguments and config.) Perhaps we could do
    some funny business about mocking git using --exec-path to check that it
    is calling git run-job in the correct order (and more importantly, not 
    calling it when certain config settings are present).
    
    
 2. The --daemonize option at the end is shamelessly stolen from git gc
    --daemonize and git daemon, but has limited abilities on some platforms
    (I've tested on Linux and macOS). I have not done my research on how far
    this gets us to allowing users to launch this at startup or something. 
    
    
 3. As I said, this is the "maintenance" "half" of "background maintenance".
    The "background" part is harder in my opinion because it involves
    creating platform-specific ways to consistently launch background
    processes. For example, Unix systems should have one way to service
    start X while Windows has another. macOS has launchd to launch processes
    as users log in, which should be a good way forward. Scalar implements a
    Windows Service that runs as root but impersonates the latest user to
    log in, and it implements a macOS "service" that is running only with
    the current user. I expect to need to create these services myself as a
    follow-up, but I lack the expertise to do it (currently). If someone
    else has experience creating these things and wants to take over or
    advise that half then I would appreciate the help!
    
    
 4. I noticed late in the RFC process that I'm not clearing my argv_arrays
    carefully in the job-runner. This will need to be rectified and
    carefully checked with valgrind before merging this code. While it leaks
    memory very slowly, it will be important that we don't leak any memory
    at all since this is a long-lived process. There's also some places
    where I was careful to not include too much of libgit.a to help keep the
    memory footprint low.
    
    

Thanks, -Stolee

Derrick Stolee (15):
  run-job: create barebones builtin
  run-job: implement commit-graph job
  run-job: implement fetch job
  run-job: implement loose-objects job
  run-job: implement pack-files job
  run-job: auto-size or use custom pack-files batch
  config: add job.pack-files.batchSize option
  job-runner: create builtin for job loop
  job-runner: load repos from config by default
  job-runner: use config to limit job frequency
  job-runner: use config for loop interval
  job-runner: add --interval=<span> option
  job-runner: skip a job if job.<job-name>.enabled is false
  job-runner: add --daemonize option
  runjob: customize the loose-objects batch size

 .gitignore                       |   2 +
 Documentation/config.txt         |   2 +
 Documentation/config/job.txt     |  37 +++
 Documentation/git-job-runner.txt |  63 +++++
 Documentation/git-run-job.txt    | 102 +++++++
 Makefile                         |   2 +
 builtin.h                        |   2 +
 builtin/job-runner.c             | 347 +++++++++++++++++++++++
 builtin/run-job.c                | 458 +++++++++++++++++++++++++++++++
 cache.h                          |   4 +-
 command-list.txt                 |   2 +
 commit-graph.c                   |   2 +-
 commit-graph.h                   |   1 +
 daemon.h                         |   7 +
 git.c                            |   2 +
 midx.c                           |   2 +-
 midx.h                           |   1 +
 t/t7900-run-job.sh               | 137 +++++++++
 18 files changed, 1168 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/config/job.txt
 create mode 100644 Documentation/git-job-runner.txt
 create mode 100644 Documentation/git-run-job.txt
 create mode 100644 builtin/job-runner.c
 create mode 100644 builtin/run-job.c
 create mode 100644 daemon.h
 create mode 100755 t/t7900-run-job.sh


base-commit: 9fadedd637b312089337d73c3ed8447e9f0aa775
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-597%2Fderrickstolee%2Fjobs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-597/derrickstolee/jobs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/597
-- 
gitgitgadget

             reply	other threads:[~2020-04-03 20:48 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 20:47 Derrick Stolee via GitGitGadget [this message]
2020-04-03 20:48 ` [PATCH 01/15] run-job: create barebones builtin Derrick Stolee via GitGitGadget
2020-04-05 15:10   ` Phillip Wood
2020-04-05 19:21     ` Junio C Hamano
2020-04-06 14:42       ` Derrick Stolee
2020-04-07  0:58         ` Danh Doan
2020-04-07 10:54           ` Derrick Stolee
2020-04-07 14:16             ` Danh Doan
2020-04-07 14:30               ` Johannes Schindelin
2020-04-03 20:48 ` [PATCH 02/15] run-job: implement commit-graph job Derrick Stolee via GitGitGadget
2020-05-20 19:08   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 03/15] run-job: implement fetch job Derrick Stolee via GitGitGadget
2020-04-05 15:14   ` Phillip Wood
2020-04-06 12:48     ` Derrick Stolee
2020-04-05 20:28   ` Junio C Hamano
2020-04-06 12:46     ` Derrick Stolee
2020-05-20 19:08   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 04/15] run-job: implement loose-objects job Derrick Stolee via GitGitGadget
2020-04-05 20:33   ` Junio C Hamano
2020-04-03 20:48 ` [PATCH 05/15] run-job: implement pack-files job Derrick Stolee via GitGitGadget
2020-05-27 22:17   ` Josh Steadmon
2020-04-03 20:48 ` [PATCH 06/15] run-job: auto-size or use custom pack-files batch Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 07/15] config: add job.pack-files.batchSize option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 08/15] job-runner: create builtin for job loop Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 09/15] job-runner: load repos from config by default Derrick Stolee via GitGitGadget
2020-04-05 15:18   ` Phillip Wood
2020-04-06 12:49     ` Derrick Stolee
2020-04-05 15:41   ` Phillip Wood
2020-04-06 12:57     ` Derrick Stolee
2020-04-03 20:48 ` [PATCH 10/15] job-runner: use config to limit job frequency Derrick Stolee via GitGitGadget
2020-04-05 15:24   ` Phillip Wood
2020-04-03 20:48 ` [PATCH 11/15] job-runner: use config for loop interval Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 12/15] job-runner: add --interval=<span> option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 13/15] job-runner: skip a job if job.<job-name>.enabled is false Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 14/15] job-runner: add --daemonize option Derrick Stolee via GitGitGadget
2020-04-03 20:48 ` [PATCH 15/15] runjob: customize the loose-objects batch size Derrick Stolee via GitGitGadget
2020-04-03 21:40 ` [PATCH 00/15] [RFC] Maintenance jobs and job runner Junio C Hamano
2020-04-04  0:16   ` Derrick Stolee
2020-04-07  0:50     ` Danh Doan
2020-04-07 10:59       ` Derrick Stolee
2020-04-07 14:26         ` Danh Doan
2020-04-07 14:43           ` Johannes Schindelin
2020-04-07  1:48     ` brian m. carlson
2020-04-07 20:08       ` Junio C Hamano
2020-04-07 22:23       ` Johannes Schindelin
2020-04-08  0:01         ` brian m. carlson
2020-05-27 22:39           ` Josh Steadmon
2020-05-28  0:47             ` Junio C Hamano
2020-05-27 21:52               ` Johannes Schindelin
2020-05-28 14:48                 ` Junio C Hamano
2020-05-28 14:50                 ` Jonathan Nieder
2020-05-28 14:57                   ` Junio C Hamano
2020-05-28 15:03                     ` Jonathan Nieder
2020-05-28 15:30                       ` Derrick Stolee
2020-05-28  4:39                         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.597.git.1585946894.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).