git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Emily Shaffer <emilyshaffer@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v4 1/9] doc: propose hooks managed by the config
Date: Wed, 07 Oct 2020 11:23:10 +0200	[thread overview]
Message-ID: <87mu0ygzk1.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20200909004939.1942347-2-emilyshaffer@google.com>


On Wed, Sep 09 2020, Emily Shaffer wrote:

First, thanks a lot for working on this. As you may have found I've done
some small amount of actual work in this area before, but mostly just
blathered about it on the ML.

> Begin a design document for config-based hooks, managed via git-hook.
> Focus on an overview of the implementation and motivation for design
> decisions. Briefly discuss the alternatives considered before this
> point. Also, attempt to redefine terms to fit into a multihook world.
> [...]
> +[[status-quo]]
> +=== Status quo
> +
> +Today users can implement multihooks themselves by using a "trampoline script"
> +as their hook, and pointing that script to a directory or list of other scripts
> +they wish to run.

...or by setting core.hooksPath in their local/global/system
config. Granted it doesn't cover the malicious hook injection case
you're also trying to solve, but does address e.g. having a git server
with a lot of centralized hooks.

The "trampoline script" also isn't needed for the common case you
mention, you just symlink the .git/hooks directory (as e.g. GitLab
does). People usually use a trampoline script for e.g. using GNU
parallel or something to execute N hooks.


> +[[hook-directories]]
> +=== Hook directories
> +
> +Other contributors have suggested Git learn about the existence of a directory
> +such as `.git/hooks/<hookname>.d` and execute those hooks in alphabetical order.

...which seems like an easy thing to add later by having a "hookdir" in
addition to "hookcmd", i.e. just specify a glob there instead of a
cmd/path.

You already use "hookdir" for something else though, so that's a bit
confusing, perhaps s/hookcmd/definehookcmd/ would be less confusing, or
perhaps more confusing...

> [...]
> +[[execution-ordering]]
> +=== Execution ordering
> +
> +We may find that config order is insufficient for some users; for example,
> +config order makes it difficult to add a new hook to the system or global config
> +which runs at the end of the hook list. A new ordering schema should be:
> +
> +1) Specified by a `hook.order` config, so that users will not unexpectedly see
> +their order change;
> +
> +2) Either dependency or numerically based.
> +
> +Dependency-based ordering is prone to classic linked-list problems, like a
> +cycles and handling of missing dependencies. But, it paves the way for enabling
> +parallelization if some tasks truly depend on others.
>
> +Numerical ordering makes it tricky for Git to generate suggested ordering
> +numbers for each command, but is easy to determine a definitive order.
> +
> +[[parallelization]]
> +=== Parallelization
> +
> +Users with many hooks might want to run them simultaneously, if the hooks don't
> +modify state; if one hook depends on another's output, then users will want to
> +specify those dependencies. If we decide to solve this problem, we may want to
> +look to modern build systems for inspiration on how to manage dependencies and
> +parallel tasks.

If you're taking requests it would make me very happy if we had
parallelism in this from day one. It's the kind of thing that's hard to
do by default once a feature is shipped since people will implicitly
depend on it not being there, i.e. we won't know what we're breaking.

I think doing it this way is simple, covers most use cases, and solves a
lot of the problems you note:

1. Don't use config order to execute hooks, use glob'd name order
   regardless of origin. I.e. a system-level hook is called "001-first"
   is executed before a local hook called "999-at-the-end" (or the other
   way around, i.e. hook origin doesn't matter).

2. We execute hooks parallel in that glob order, i.e. a pthread for-loop
   that starts the 001-first task first, eventually getting to
   999-at-the-end N at a time. I.e. the same as:

       parallel --jobs N --halt-on-error soon,fail=1" ::: <hooks-in-glob-order>

   This allows for parallelism but guarantees the very useful case of
   having a global log hook being guaranteed to execute.

3. A hook can define "parallel=no" in its config. We'll then run it
   while no other hook is running.

4. We don't attempt to do dependencies etc, if you need that sort of
   complexity you can just make one of the hooks be a hook runner as
   users do now for the common "make it parallel" case.

It's a relatively small change to the code you have already. I.e. the
for_each() in run_hooks() would be called N times for each continuous
glob'd parallel/non-parallel segment, and hook_list()'s config parsing
would learn to spew those out as a list-of-lists.

This also gives you a rudimentary implementation of the dependency
schema you proposed for free. I.e. a definition of (pseudocode):

    hookcmd=000-first
    parallel=no

    hookcmd=250-middle-abc
    hookcmd=250-middle-xyz

    hookcmd=300-gather
    parallel=no

    hookcmd=999-the-end

Would result in the pseudocode execution of;

    segments=[[000-first],
              [250-middle-abc, 250-middle-xyz],
              [300-gather],
              [999-the-end]]
    for each s in segments:
        ok = run_in_parallel(s)
        last if !ok # or, depending on "early exit?" config

I.e.:

 * The common case of people adding N hooks won't take sum(N) time.

 * parallel=no hooks aren't run in parallel with other non-parallel
   hooks

 * We support a rudimentary dependency schema as a side-effect,
   i.e. defining 300-gather as non-parallel allows it to act as the sole
   "reduce" step in a map/reduce in a "map" step started with the 250-*
   hooks.

> +[[securing-hookdir-hooks]]
> +=== Securing hookdir hooks
> +
> +With the design as written in this doc, it's still possible for a malicious user
> +to modify `.git/config` to include `hook.pre-receive.command = rm -rf /`, then
> +zip their repo and send it to another user. It may be necessary to teach Git to
> +only allow inlined hooks like this if they were configured outside of the local
> +scope (in other words, only run hookcmds, and only allow hookcmds to be
> +configured in global or system scope); or another approach, like a list of safe
> +projects, might be useful. It may also be sufficient (or at least useful) to
> +teach a `hook.disableAll` config or similar flag to the Git executable.

I think this part of the doc should note a bit of the context in
https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/

I.e. even if we get a 100% secure hook implementation we've done
practically nothing for overall security, since we'll still run the
pager, aliases etc. from that local repo.

This is a great step in the right direction, but it behooves us to note
that, so some user reading this documentation without context doesn't
think inspecting untrusted repositories like that is safe just because
they set the right hook settings in their config (once what's being
proposed here is implemented).

  parent reply	other threads:[~2020-10-07  9:23 UTC|newest]

Thread overview: 170+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21 18:54 [PATCH v2 0/4] propose config-based hooks Emily Shaffer
2020-05-21 18:54 ` [PATCH v2 1/4] doc: propose hooks managed by the config Emily Shaffer
2020-05-22 10:13   ` Phillip Wood
2020-06-09 20:26     ` Emily Shaffer
2020-05-21 18:54 ` [PATCH v2 2/4] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-05-21 18:54 ` [PATCH v2 3/4] hook: add list command Emily Shaffer
2020-05-22 10:27   ` Phillip Wood
2020-06-09 21:49     ` Emily Shaffer
2020-08-17 13:36       ` Phillip Wood
2020-05-24 23:00   ` Johannes Schindelin
2020-05-27 23:37     ` Emily Shaffer
2020-05-21 18:54 ` [PATCH v2 4/4] hook: add --porcelain to " Emily Shaffer
2020-05-24 23:00   ` Johannes Schindelin
2020-05-25  0:29     ` Johannes Schindelin
2020-07-28 22:24 ` [PATCH v3 0/6] propose config-based hooks Emily Shaffer
2020-07-28 22:24   ` [PATCH v3 1/6] doc: propose hooks managed by the config Emily Shaffer
2020-07-28 22:24   ` [PATCH v3 2/6] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-07-28 22:24   ` [PATCH v3 3/6] hook: add list command Emily Shaffer
2020-07-28 22:24   ` [PATCH v3 4/6] hook: add --porcelain to " Emily Shaffer
2020-07-28 22:24   ` [RFC PATCH v3 5/6] parse-options: parse into argv_array Emily Shaffer
2020-07-29 19:33     ` Junio C Hamano
2020-07-30 23:41       ` Junio C Hamano
2020-07-28 22:24   ` [RFC PATCH v3 6/6] hook: add 'run' subcommand Emily Shaffer
2020-09-09  0:49   ` [PATCH v4 0/9] propose config-based hooks Emily Shaffer
2020-09-09  0:49     ` [PATCH v4 1/9] doc: propose hooks managed by the config Emily Shaffer
2020-09-23 22:59       ` Jonathan Tan
2020-09-24 21:54         ` Emily Shaffer
2020-10-07  9:23       ` Ævar Arnfjörð Bjarmason [this message]
2020-10-22  0:58         ` Emily Shaffer
2020-10-23 19:10           ` Ævar Arnfjörð Bjarmason
2020-10-29 15:38             ` Emily Shaffer
2020-10-29 20:04               ` Ævar Arnfjörð Bjarmason
2020-09-09  0:49     ` [PATCH v4 2/9] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-10-05 23:24       ` Jonathan Nieder
2020-10-06 19:06         ` Emily Shaffer
2020-09-09  0:49     ` [PATCH v4 3/9] hook: add list command Emily Shaffer
2020-09-11 13:27       ` Phillip Wood
2020-09-11 16:51         ` Emily Shaffer
2020-09-23 23:04       ` Jonathan Tan
2020-10-06 20:46         ` Emily Shaffer
2020-09-27 19:23       ` Martin Ågren
2020-10-06 20:20         ` Emily Shaffer
2020-10-05 23:27       ` Jonathan Nieder
2020-09-09  0:49     ` [PATCH v4 4/9] hook: add --porcelain to " Emily Shaffer
2020-09-28 19:29       ` Josh Steadmon
2020-09-09  0:49     ` [PATCH v4 5/9] parse-options: parse into strvec Emily Shaffer
2020-10-05 23:30       ` Jonathan Nieder
2020-10-06  4:49         ` Junio C Hamano
2020-09-09  0:49     ` [PATCH v4 6/9] hook: add 'run' subcommand Emily Shaffer
2020-09-11 13:30       ` Phillip Wood
2020-09-28 19:29       ` Josh Steadmon
2020-10-05 23:39       ` Jonathan Nieder
2020-10-06 22:57         ` Emily Shaffer
2020-09-09  0:49     ` [PATCH v4 7/9] hook: replace run-command.h:find_hook Emily Shaffer
2020-09-09 20:32       ` Junio C Hamano
2020-09-10 19:08         ` Emily Shaffer
2020-09-23 23:20       ` Jonathan Tan
2020-10-05 23:42       ` Jonathan Nieder
2020-09-09  0:49     ` [PATCH v4 8/9] commit: use config-based hooks Emily Shaffer
2020-09-10 13:50       ` Phillip Wood
2020-09-10 22:21         ` Junio C Hamano
2020-09-23 23:47       ` Jonathan Tan
2020-10-05 21:27         ` Emily Shaffer
2020-10-05 23:48           ` Jonathan Nieder
2020-10-06 19:08             ` Emily Shaffer
2020-09-09  0:49     ` [PATCH v4 9/9] run_commit_hook: take strvec instead of varargs Emily Shaffer
2020-09-10 14:16       ` Phillip Wood
2020-09-11 13:20         ` Phillip Wood
2020-09-09 21:04     ` [PATCH v4 0/9] propose config-based hooks Junio C Hamano
2020-10-14 23:24     ` [PATCH v5 0/8] propose config-based hooks (part I) Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 1/8] doc: propose hooks managed by the config Emily Shaffer
2020-10-15 16:31         ` Ævar Arnfjörð Bjarmason
2020-10-16 17:29           ` Junio C Hamano
2020-10-21 23:37           ` Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 2/8] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 3/8] hook: add list command Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 4/8] hook: include hookdir hook in list Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 5/8] hook: implement hookcmd.<name>.skip Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 6/8] parse-options: parse into strvec Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 7/8] hook: add 'run' subcommand Emily Shaffer
2020-10-14 23:24       ` [PATCH v5 8/8] hook: replace find_hook() with hook_exists() Emily Shaffer
2020-12-05  1:45       ` [PATCH v6 00/17] propose config-based hooks (part I) Emily Shaffer
2020-12-05  1:45         ` [PATCH 01/17] doc: propose hooks managed by the config Emily Shaffer
2020-12-05  1:45         ` [PATCH 02/17] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-12-05  1:45         ` [PATCH 03/17] hook: add list command Emily Shaffer
2020-12-05  1:45         ` [PATCH 04/17] hook: include hookdir hook in list Emily Shaffer
2020-12-05  1:45         ` [PATCH 05/17] hook: respect hook.runHookDir Emily Shaffer
2020-12-05  1:45         ` [PATCH 06/17] hook: implement hookcmd.<name>.skip Emily Shaffer
2020-12-05  1:45         ` [PATCH 07/17] parse-options: parse into strvec Emily Shaffer
2020-12-05  1:45         ` [PATCH 08/17] hook: add 'run' subcommand Emily Shaffer
2020-12-11 10:15           ` Phillip Wood
2020-12-15 21:41             ` Emily Shaffer
2020-12-05  1:45         ` [PATCH 09/17] hook: replace find_hook() with hook_exists() Emily Shaffer
2020-12-05  1:46         ` [PATCH 10/17] hook: support passing stdin to hooks Emily Shaffer
2020-12-05  1:46         ` [PATCH 11/17] run-command: allow stdin for run_processes_parallel Emily Shaffer
2020-12-05  1:46         ` [PATCH 12/17] hook: allow parallel hook execution Emily Shaffer
2020-12-05  1:46         ` [PATCH 13/17] hook: allow specifying working directory for hooks Emily Shaffer
2020-12-05  1:46         ` [PATCH 14/17] run-command: add stdin callback for parallelization Emily Shaffer
2020-12-05  1:46         ` [PATCH 15/17] hook: provide stdin by string_list or callback Emily Shaffer
2020-12-08 21:09           ` SZEDER Gábor
2020-12-08 22:11             ` Emily Shaffer
2020-12-05  1:46         ` [PATCH 16/17] run-command: allow capturing of collated output Emily Shaffer
2020-12-05  1:46         ` [PATCH 17/17] hooks: allow callers to capture output Emily Shaffer
2020-12-16  0:34         ` [PATCH v6 00/17] propose config-based hooks (part I) Josh Steadmon
2020-12-16  0:56           ` Junio C Hamano
2020-12-16 20:16             ` Emily Shaffer
2020-12-16 23:32               ` Junio C Hamano
2020-12-18  2:07                 ` Emily Shaffer
2020-12-18  5:29                   ` Junio C Hamano
2020-12-22  0:02         ` [PATCH v7 " Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 01/17] doc: propose hooks managed by the config Emily Shaffer
2021-01-23 15:38             ` Ævar Arnfjörð Bjarmason
2021-01-29 23:52               ` Emily Shaffer
2021-02-01 22:11             ` Junio C Hamano
2021-03-10 19:30               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 02/17] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 03/17] hook: add list command Emily Shaffer
2021-01-31  3:10             ` Jonathan Tan
2021-02-09 21:06               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 04/17] hook: include hookdir hook in list Emily Shaffer
2021-01-31  3:20             ` Jonathan Tan
2021-02-09 22:05               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 05/17] hook: respect hook.runHookDir Emily Shaffer
2021-01-31  3:35             ` Jonathan Tan
2021-02-09 22:31               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 06/17] hook: implement hookcmd.<name>.skip Emily Shaffer
2021-01-31  3:40             ` Jonathan Tan
2021-02-09 22:57               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 07/17] parse-options: parse into strvec Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 08/17] hook: add 'run' subcommand Emily Shaffer
2021-01-31  4:22             ` Jonathan Tan
2021-02-11 22:44               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 09/17] hook: replace find_hook() with hook_exists() Emily Shaffer
2021-01-31  4:39             ` Jonathan Tan
2021-02-12 22:15               ` Emily Shaffer
2021-02-18 22:23               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 10/17] hook: support passing stdin to hooks Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 11/17] run-command: allow stdin for run_processes_parallel Emily Shaffer
2021-02-01  5:38             ` Jonathan Tan
2021-02-19 20:23               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 12/17] hook: allow parallel hook execution Emily Shaffer
2021-02-01  6:04             ` Jonathan Tan
2021-02-22 21:46               ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 13/17] hook: allow specifying working directory for hooks Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 14/17] run-command: add stdin callback for parallelization Emily Shaffer
2021-02-01  6:51             ` Jonathan Tan
2021-02-22 23:38               ` Emily Shaffer
2021-02-23 19:33                 ` Jonathan Tan
2021-03-10 18:24                   ` Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 15/17] hook: provide stdin by string_list or callback Emily Shaffer
2021-02-01  7:04             ` Jonathan Tan
2021-02-23 19:52               ` Emily Shaffer
2021-02-25 20:56                 ` Jonathan Tan
2021-03-02  1:47                   ` Emily Shaffer
2021-03-02 23:33                     ` Jonathan Tan
2020-12-22  0:02           ` [PATCH v7 16/17] run-command: allow capturing of collated output Emily Shaffer
2020-12-22  0:02           ` [PATCH v7 17/17] hooks: allow callers to capture output Emily Shaffer
2020-12-22  2:11           ` [PATCH v7 00/17] propose config-based hooks (part I) Junio C Hamano
2020-12-28 18:34             ` Emily Shaffer
2020-12-28 22:50               ` Junio C Hamano
2020-12-28 22:37           ` [PATCH v3 18/17] doc: make git-hook.txt point of truth Emily Shaffer
2020-12-28 22:39             ` Emily Shaffer
2021-01-29 23:59           ` [PATCH v7 00/17] propose config-based hooks (part I) Emily Shaffer
2021-02-16 19:46           ` Josh Steadmon
2021-02-16 22:47             ` Junio C Hamano
2021-02-17 21:21               ` Josh Steadmon
2021-02-17 23:07                 ` Junio C Hamano
2021-02-25 19:50                   ` Junio C Hamano
2021-03-01 21:51                     ` Emily Shaffer
2021-03-01 22:19                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mu0ygzk1.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).