From: Derrick Stolee <firstname.lastname@example.org> To: Junio C Hamano <email@example.com>, Phillip Wood <firstname.lastname@example.org> Cc: Derrick Stolee via GitGitGadget <email@example.com>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Derrick Stolee <email@example.com> Subject: Re: [PATCH 01/15] run-job: create barebones builtin Date: Mon, 6 Apr 2020 10:42:23 -0400 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> On 4/5/2020 3:21 PM, Junio C Hamano wrote: > Phillip Wood <firstname.lastname@example.org> writes: > >> Hi Stolee >> >> On 03/04/2020 21:48, Derrick Stolee via GitGitGadget wrote: >>> From: Derrick Stolee <email@example.com> >>> >>> The 'git run-job' command will be used to execute a short-lived set >>> of maintenance activities by a background job manager. The intention >>> is to perform small batches of work that reduce the foreground time >>> taken by repository maintenance such as 'git gc --auto'. >>> >>> This change does the absolute minimum to create the builtin and show >>> the usage output. >>> >>> Provide an explicit warning that this command is experimental. The >>> set of jobs may change, and each job could alter its behavior in >>> future versions. >>> >>> RFC QUESTION: This builtin is based on the background maintenance in >>> Scalar. Specifically, this builtin is based on the "scalar run <job>" >>> command  . My default thought was to make this a "git run <job>" >>> command to maximize similarity. However, it seems like "git run" is >>> too generic. Or, am I being overly verbose for no reason? >> >> Having read through this series I wondered if we wanted a single git >> command such as 'git maintenance' (suggestions of better names >> welcome) and then 'git run-job' could become 'git maintenance run', >> 'git job-runner' would become another subcommand (run-jobs or >> schedule-jobs?) and the 'git please-run-maintenance-on-this-repo' you >> mentioned in you email to Junio could become 'git maintenance init' >> (or maybe setup) > > I had a very similar impression. In addition to what you already > said, a few more were: > > - Why the existing "git repack" isn't such "maintenance" command? > IOW why do we even need [01/15]? After all, "repack" may have > started its life as a tool to reorganize the PACKFILES, but it is > no longer limited to 'git/objects/pack/*.pack' files with its > knowledge about the loose object files and the "--prune" option. > Consolidating pieces of information spread across multiple .idx > files, reachability bitmaps and commit graph files, into a newer > and more performant forms can just be part of "packing the pieces > of information in a repository for optimum performance", which is > a better way to understand why "repack" has a word 'pack' in its > name. To me, "git repack" is a specific kind of maintenance. The end result is a pack-file. Now, "git gc" is a bit more general, because it will create a pack-file but also update the commit-graph file. Still, its name is still very specific: it "collects garbage". The goals of this series are to replace "git gc --auto" with something less invasive. I'll include an alternate CLI proposal at the end of this message. > - Many of the "maintenance" operations this series proposes do make > sense, just like other "maintenance" operations we already have > in "repack", "prune", "prune-packed" etc., which are welcome > additions. Thanks. I'm glad these steps make sense. They are definitely more "incremental" updates than a full repack or GC. > - Like the individual steps that appear in e.g. "repack", however, > some of the individual steps in this series can be triggered by > calling underlying tools directly, allowing scripted maintenance > commands that suit individual needs better than the canned > invocation of "run-job", but I didn't get the impression that the > series strives to make sure that all knobs of these individual > steps are available to scripters who want to deviate from what > "run-job" prescribes. If it is not doing so, we probably should. > > - Again, I do not think we want a reimplementation of cron, at or > inetd that is not specific to "git" at all. I expected the job-runner to get some push-back. The design for it in the current RFC matched how we do it in Scalar more than anything else. You're probably right that it would be better to leave the "background" part to the platform. Of course, not every platform has "cron" but that just means we need a cross-platform way to launch Git processes on some schedule. That could be a command that creates a cron job on platforms that have it, and on Windows it could create a scheduled task instead. But what should we launch? It should probably be a Git command that checks config for a list of repositories, then runs "the maintenance command" on each of those repos. I'm inserting a break here to draw the eye to a new proposed design: --- Create a "git maintenance" builtin. This has a few subcommands: 1. "run" will run the configured maintenance on the current repo. This should become the single entry point for users to say "please clean up my repo." What _exactly_ it does can be altered with config. I'll list some possibilities after listing the subcommands. 2. "run-on-repos" uses command-line arguments or config to launch "git -C <dir> maintenance run" for all configured directories. The intention is that this is launched on some schedule by a platform- specific scheduling mechanism (i.e. cron). (This subcommand could use a better name.) 3. "schedule" adds the current repository to the configured list of repositories for running with "run-on-repos". It will also initialize the platform-specific scheduling mechanism. This may be to start the schedule for the first time OR to update how frequent "run-on-repos" is run, as appropriate. 4. (OPTIONAL) "mode <mode>" adjusts the config for the current repo to change the type of maintenance requested for this repo. For example, "simple" could just run "git gc --auto" using a normal range. "incremental" could run the maintenance tasks from this series. Finally, "server" could run maintenance tasks as if we are serving the repo to others, so we repack aggressively with full bitmaps, and more frequently. Here are some possible maintenance tasks. Not all of them would be appropriate to run on the same repo, or at least not with the same frequency: * "fetch" : the background fetch from PATCH 3. Appropriate for all modes, but perhaps would want users to opt-in to this in the basic mode. * "commit-graph" : the incremental commit-graph writes from PATCH 2. Appropriate whenever the "fetch" command is being run, but also valuable for the "server" mode. * "gc" : Run "git gc --auto". This would be enabled by default, but should be disabled for the "incremental" and "server" modes. * "repack" : Run "git repack <options>" with appropriate options based on config. The "server" mode would include custom delta and bitmap options. (I will leave the specifics to those who maintain servers to recommend the best options for "server" mode.) * "loose-objects" : see PATCH 4. Appropriate for "incremental" mode. * "multi-pack-index" or "incremental-repack" : Run the "pack-files" job from PATCH 5. Appropriate for "incremental" mode. * "pack-refs" : create a packed-refs file or repack the reftable as appropriate for those features. (I have less familiarity with these.) Notice that with this new set of options we could do something rather dramatic: replace all calls to "git gc --auto" with "git maintenance run --auto". By default, these would be equivalent. However, "git maintenance run --auto" is more clear that the behavior is less specific than "git gc" and could be configured to do something different. I used an "--auto" option in the suggestion above to help distinguish between the command being run as a foreground operation instead of a background operation. Part of setting up a schedule would include disabling these "foreground" maintenance tasks and relying entirely on the background tasks instead. The best situation would be to avoid launching the subprocess at all. --- What do people think of this alternative? Does this get us closer to an appropriate level of work for Git to do? Thanks, -Stolee
next prev parent reply other threads:[~2020-04-06 14:42 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-03 20:47 [PATCH 00/15] [RFC] Maintenance jobs and job runner Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 01/15] run-job: create barebones builtin Derrick Stolee via GitGitGadget 2020-04-05 15:10 ` Phillip Wood 2020-04-05 19:21 ` Junio C Hamano 2020-04-06 14:42 ` Derrick Stolee [this message] 2020-04-07 0:58 ` Danh Doan 2020-04-07 10:54 ` Derrick Stolee 2020-04-07 14:16 ` Danh Doan 2020-04-07 14:30 ` Johannes Schindelin 2020-04-03 20:48 ` [PATCH 02/15] run-job: implement commit-graph job Derrick Stolee via GitGitGadget 2020-05-20 19:08 ` Josh Steadmon 2020-04-03 20:48 ` [PATCH 03/15] run-job: implement fetch job Derrick Stolee via GitGitGadget 2020-04-05 15:14 ` Phillip Wood 2020-04-06 12:48 ` Derrick Stolee 2020-04-05 20:28 ` Junio C Hamano 2020-04-06 12:46 ` Derrick Stolee 2020-05-20 19:08 ` Josh Steadmon 2020-04-03 20:48 ` [PATCH 04/15] run-job: implement loose-objects job Derrick Stolee via GitGitGadget 2020-04-05 20:33 ` Junio C Hamano 2020-04-03 20:48 ` [PATCH 05/15] run-job: implement pack-files job Derrick Stolee via GitGitGadget 2020-05-27 22:17 ` Josh Steadmon 2020-04-03 20:48 ` [PATCH 06/15] run-job: auto-size or use custom pack-files batch Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 07/15] config: add job.pack-files.batchSize option Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 08/15] job-runner: create builtin for job loop Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 09/15] job-runner: load repos from config by default Derrick Stolee via GitGitGadget 2020-04-05 15:18 ` Phillip Wood 2020-04-06 12:49 ` Derrick Stolee 2020-04-05 15:41 ` Phillip Wood 2020-04-06 12:57 ` Derrick Stolee 2020-04-03 20:48 ` [PATCH 10/15] job-runner: use config to limit job frequency Derrick Stolee via GitGitGadget 2020-04-05 15:24 ` Phillip Wood 2020-04-03 20:48 ` [PATCH 11/15] job-runner: use config for loop interval Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 12/15] job-runner: add --interval=<span> option Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 13/15] job-runner: skip a job if job.<job-name>.enabled is false Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 14/15] job-runner: add --daemonize option Derrick Stolee via GitGitGadget 2020-04-03 20:48 ` [PATCH 15/15] runjob: customize the loose-objects batch size Derrick Stolee via GitGitGadget 2020-04-03 21:40 ` [PATCH 00/15] [RFC] Maintenance jobs and job runner Junio C Hamano 2020-04-04 0:16 ` Derrick Stolee 2020-04-07 0:50 ` Danh Doan 2020-04-07 10:59 ` Derrick Stolee 2020-04-07 14:26 ` Danh Doan 2020-04-07 14:43 ` Johannes Schindelin 2020-04-07 1:48 ` brian m. carlson 2020-04-07 20:08 ` Junio C Hamano 2020-04-07 22:23 ` Johannes Schindelin 2020-04-08 0:01 ` brian m. carlson 2020-05-27 22:39 ` Josh Steadmon 2020-05-28 0:47 ` Junio C Hamano 2020-05-27 21:52 ` Johannes Schindelin 2020-05-28 14:48 ` Junio C Hamano 2020-05-28 14:50 ` Jonathan Nieder 2020-05-28 14:57 ` Junio C Hamano 2020-05-28 15:03 ` Jonathan Nieder 2020-05-28 15:30 ` Derrick Stolee 2020-05-28 4:39 ` Johannes Schindelin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH 01/15] run-job: create barebones builtin' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).