git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Hook management via 'git hooks' command
@ 2019-11-16  1:11 Emily Shaffer
  2019-11-16  5:45 ` brian m. carlson
  0 siblings, 1 reply; 12+ messages in thread
From: Emily Shaffer @ 2019-11-16  1:11 UTC (permalink / raw)
  To: git
  Cc: brian m. carlson, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

Hi all.

The topic of multihook support has come up before; here are a handful of
conversations I could find on it:

 brian m. carlson's proposal for multiple hooks via .git/hooks/pre-commit.d/:
   https://public-inbox.org/git/20190424004948.728326-1-sandals@crustytoothpaste.net/

 Jonathan Nieder's discussion of hooks as an attack vector:
   https://public-inbox.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/

 Ævar Arnfjörð Bjarmason's RFC for multihooks in a similar manner to brian's:
   https://public-inbox.org/git/CACBZZX6j6q2DUN_Z-Pnent1u714dVNPFBrL_PiEQyLmCzLUVxg@mail.gmail.com/

As I consider these conversations, I think about what goals we are aiming
for:

 1. Let a user run more than one hook on a single event.
 2. Let a user determine the order of those hooks.
 3. Let a user version-control their hooks without kludging. (This is actually
    largely solved by core.hookName, thanks to Ævar.)
 4. Make it hard for a user to run a hook they did not expect to run.
 5. Don't break users who currently have hooks in $REPO/.git/hooks, but provide
    a path for them to stop using hooks from that directory.

It seems that brian's thread stalled out in v2, but as I understand it, the
method was:

 - For each codepath which executes a hook:
   - Check whether .git/hooks/${hookname} is valid (exists, executable)
   - If not, check whether .git/hooks/${hookname}.d exists, and run each script
     within, in alphabetical order.

This approach seems similar to one that Ævar suggested in their RFC linked
above.

Benefits of that approach include:
 - A familiar Unixlike interface
 - Previously configured hooks will Just Work

Drawbacks of that approach include:
 - Still an attack vector, unless core.hooksDirectory is set
 - Still need to configure a hook for each repo, unless core.hooksDirectory is
   set
 - Hooks are still not version controlled, unless core.hooksDirectory is set

This meets goals 1, 2, and 5. Using core.hookPath makes this approach meet 3 as
well.

An alternate suggestion from Jonathan Nieder seems to look something like:

 - Learn a core.${hookname} config.
 - Teach all hook executing codepaths to check core.${hookname}
 - Either add multiple core.${hookname} entries and order ...TBD? or put lots of
   hooks on a single config entry and run in order on that line.

Benefits of the config approach include:
 - Easy to configure global, repo-wide, worktree-wide hooks
 - Long-term, the decision of which hooks to run is left entirely to the user
 - The hooks themselves can be version controlled, even distributed with the
   repo (a user could specify ${repopath}/hooks/pre-push-lint if they wanted,
   and we chose to allow it)

Drawbacks of the config approach include:
 - Different enough from the current approach to be unusual. Coming away from
   the transition period will require some thought.
 - The ordering of hooks may be confusing.
 - It may be difficult to choose _not_ to run a global hook from a local
   project.

This meets goals 3, 4, and 5; it could meet 1 and 2 but needs to be fleshed out
more.

Meanwhile, I have some other concerns when I read through the source:

 - I don't think I saw a codified list of all the possible hooks. Meaning,
   find_hook() is called with a freeform string, not from some enum, and it's
   hard to programmatically list all places where a hook could be used. This
   came up when I was looking at bugreport; even the template
   ${hookname}.sample files prepopulated in .git/hooks aren't a comprehensive
   list of all hooks mentioned in `git help hooks`. This feels like a lack of
   assurance that all hooks are well-documented.

Please do let me know if I missed something in my attempt to frame the "story so
far".

Here's my suggestion.

 - Let configs handle the hook list and ordering, via 'hook.{hookname}' which
   can be specified more than once.
   - global config: hook.update = /path/to/firsthook
     user config: hook.update = /path/to/secondhook
     worktree config: hook.update = -/path/to/firsthook #eliminate the firsthook
      call
   - global config: hook.update = /path/to/firsthook
     repo config: hook.update = /path/to/secondhook
     repo config: hook.update = ^/path/to/firsthook #move firsthook execution
       after secondhook for this project
 - Let users decide whether they want to check core.hookDir in addition to their
   config, instead of their config, or not at all, via a config
   'hook.runHookDir' "hookdir-first", "hookdir-only", "config-only", etc.
 - Let users ask to be notified if running a hook from .git/hooks via a config
   'hook.warnHookDir'. (Mentioning .git/hooks rather than core.hookDir is
   intentional here.) Alternatively, ask for 'hook.warn' with values "hookdir",
   "all" depending on how trusting the user feels.
   - If we want to phase out .git/hooks, we can make this notification opt-out
     instead of opt-in, someday.
   - between runHookDir and warnHookDir, users are able to phase out
     .git/hooks scripts at their own pace.
 - Abstract most of this via a 'git hooks' command.
   - 'git hooks add <hookname> [--<scope>] <path>' to append another hook;
     let the scope setting match 'git config'.
   - 'git hooks show [<hookname>]' to indicate which hooks will be run, either
     on a specific event or for all events, and in which order, as well as each
     hook's config origin
   - 'git hooks edit <hookname>' to modify the hook order, or choose not to run
     a hook locally
   - By managing it through a command, we can reorder the contents of config
     files if it makes sense to do so.
 - Add a hook library.
   - Optionally specify all hook events via enum, so we aren't string-typing
     calls to find_hook() anymore.
   - Resolve configs into a list of hooks to run by concatenating them together
     in config precedence order.
     - Also allow configs formatted like "-<path>" to remove an
       earlier-configured invocation of that hook, or "^<path>" to move the
       earlier-configured invocation to this point in the execution order
   - Move the find_hook() implementation to here, to account for the multihook
     ordering

I think it reaches the goals I mentioned:

 1. User can specify more than one hook per event by adding more configs or
    running 'git hooks add <event>'.
 2. User can determine the order by reordering their configs in file, using
    "^/hookpath" notation, or using 'git hooks edit'.
 3. User can point to a hook script which is checked into the repo they want it
    to run on, another repo in their filesystem, or anywhere at all. Contrary to
    the behavior of core.hookDir, users can manage each hook in a different repo
    if they wish (maybe the git-secrets hooks all live in the git-secrets repo,
    and the Gerrit Change-Id hook lives in the gerrit repo, and each project's
    pre-commit formatting hooks live in their own projects' repos).
 4. Users who have "bought in" to the config-based hook management can ask to be
    warned if a hook is run from .git/hooks, and can turn off running hooks from
    .git/hooks entirely if they wish.
 5. By defaulting to hook.runHookDir=hookdir-first or hookdir-only, we can avoid
    breaking users who are using .git/hooks/ today. The former gives value-add -
    the user can start using config hooks seamlessly - while the latter gives
    old functionality only.

The other benefits I can think of:
 - Having a friendly porcelain command to manage hooks makes life easier for the
   Git and Unix uninitiated.
 - If 'git hooks edit' UI looks like 'git rebase --interactive' UI, then users
   may already feel at home using it.
 - If organizations are already distributing global configs, they can also
   distribute hooks such as git-secrets or other internal hooks if they wish.
 - Users who want to use a hook that comes with a repo may still be subject to
   attack if the hook shipped by the repo changes under their feet (although
   they have explicitly asked Git to run that hook, so the threat doesn't feel
   different from running a shell script from a repo manually).

The drawbacks I can think of:
 - It is a much more complicated approach than adding .git/hooks/${hookname}.d.
 - The config syntax can clutter the output if users want to examine their
   config with 'git config --list'.
 - It is still possible for someone to attack via a zip file if they manipulate
   .git/config to use a hook they specified in the repo.

The issue with .git/config is still a pretty big one, but my own thought is that
we should treat it as a separate piece. Yes, those kinds of attacks are still
possible as long as either .git/config or .git/hooks are present in the repo
directory. But coding around .git/config will lead to some cruft that won't make
sense once repo-local config files are made more secure (for example, one
solution might be to ignore local and worktree configs when evaluating the
hooks list, which would be inconvenient and no longer make sense if those config
scopes are secured).

I look forward to the discussion. Thanks for your thoughts.
 - Emily

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-16  1:11 [RFC] Hook management via 'git hooks' command Emily Shaffer
@ 2019-11-16  5:45 ` brian m. carlson
  2019-11-18 22:38   ` Emily Shaffer
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2019-11-16  5:45 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jonathan Nieder, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 4230 bytes --]

On 2019-11-16 at 01:11:25, Emily Shaffer wrote:
> Here's my suggestion.
> 
>  - Let configs handle the hook list and ordering, via 'hook.{hookname}' which
>    can be specified more than once.
>    - global config: hook.update = /path/to/firsthook
>      user config: hook.update = /path/to/secondhook
>      worktree config: hook.update = -/path/to/firsthook #eliminate the firsthook
>       call
>    - global config: hook.update = /path/to/firsthook
>      repo config: hook.update = /path/to/secondhook
>      repo config: hook.update = ^/path/to/firsthook #move firsthook execution
>        after secondhook for this project

I'd like to hear more about how we handle multiple hooks that are
repo-specific and don't live in the PATH.  This is a common situation
for existing tools that handle multiple hooks, and it's one I think we
should support.

Perhaps we add a "git hook execute" subcommand that executes scripts
from the hook directory.

>  - Let users decide whether they want to check core.hookDir in addition to their
>    config, instead of their config, or not at all, via a config
>    'hook.runHookDir' "hookdir-first", "hookdir-only", "config-only", etc.
>  - Let users ask to be notified if running a hook from .git/hooks via a config
>    'hook.warnHookDir'. (Mentioning .git/hooks rather than core.hookDir is
>    intentional here.) Alternatively, ask for 'hook.warn' with values "hookdir",
>    "all" depending on how trusting the user feels.
>    - If we want to phase out .git/hooks, we can make this notification opt-out
>      instead of opt-in, someday.
>    - between runHookDir and warnHookDir, users are able to phase out
>      .git/hooks scripts at their own pace.
>  - Abstract most of this via a 'git hooks' command.
>    - 'git hooks add <hookname> [--<scope>] <path>' to append another hook;
>      let the scope setting match 'git config'.
>    - 'git hooks show [<hookname>]' to indicate which hooks will be run, either
>      on a specific event or for all events, and in which order, as well as each
>      hook's config origin
>    - 'git hooks edit <hookname>' to modify the hook order, or choose not to run
>      a hook locally
>    - By managing it through a command, we can reorder the contents of config
>      files if it makes sense to do so.
>  - Add a hook library.
>    - Optionally specify all hook events via enum, so we aren't string-typing
>      calls to find_hook() anymore.
>    - Resolve configs into a list of hooks to run by concatenating them together
>      in config precedence order.
>      - Also allow configs formatted like "-<path>" to remove an
>        earlier-configured invocation of that hook, or "^<path>" to move the
>        earlier-configured invocation to this point in the execution order
>    - Move the find_hook() implementation to here, to account for the multihook
>      ordering

I think this addresses most of the concerns that I had about ordering.
It is still a little suboptimal that we're relying on the ordering of
the config file, since it makes things equivalent to numbered files in
.d directories hard.

Possibly as an alternative to the ^ syntax, we could make the hook value
be of the form "key program", where key is a sort key (e.g., a number)
and program is the program to run.  We pick normal config file ordering
if the keys are identical.  Then if the system config wants to have a
hook that always runs at the end, it can do so easily.

In addition, we should be sure that attempting to remove a hook which
doesn't exist isn't an error, since a user might want to set their
~/.gitconfig file to always unset a global hook that may or may not
exist.

We also need to address exit codes with multiple hooks and whether we
fail fast or not.  My series may provide some valuable options here, or
we may choose to go with a single, simpler approach.  Whatever we do, we
should document the behavior clearly.

Overall, though, I think this approach has a lot of potential and I feel
positive about it.  Thanks for bringing this up again in a productive
and helpful way.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-16  5:45 ` brian m. carlson
@ 2019-11-18 22:38   ` Emily Shaffer
  2019-11-19  0:51     ` brian m. carlson
  0 siblings, 1 reply; 12+ messages in thread
From: Emily Shaffer @ 2019-11-18 22:38 UTC (permalink / raw)
  To: brian m. carlson, git, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

On Sat, Nov 16, 2019 at 05:45:01AM +0000, brian m. carlson wrote:
> On 2019-11-16 at 01:11:25, Emily Shaffer wrote:
> > Here's my suggestion.
> > 
> >  - Let configs handle the hook list and ordering, via 'hook.{hookname}' which
> >    can be specified more than once.
> >    - global config: hook.update = /path/to/firsthook
> >      user config: hook.update = /path/to/secondhook
> >      worktree config: hook.update = -/path/to/firsthook #eliminate the firsthook
> >       call
> >    - global config: hook.update = /path/to/firsthook
> >      repo config: hook.update = /path/to/secondhook
> >      repo config: hook.update = ^/path/to/firsthook #move firsthook execution
> >        after secondhook for this project
> 
> I'd like to hear more about how we handle multiple hooks that are
> repo-specific and don't live in the PATH.  This is a common situation
> for existing tools that handle multiple hooks, and it's one I think we
> should support.

I guess I'm confused about where PATH comes into play. Do you mean that
the hook being run relies on PATH to be set appropriately? I had
envisioned absolute paths in the config.

> Perhaps we add a "git hook execute" subcommand that executes scripts
> from the hook directory.

Can you give an example of when you'd use it? I'm not understanding your
concern and I think an example use case would help me see what you mean.

> 
> >  - Let users decide whether they want to check core.hookDir in addition to their
> >    config, instead of their config, or not at all, via a config
> >    'hook.runHookDir' "hookdir-first", "hookdir-only", "config-only", etc.
> >  - Let users ask to be notified if running a hook from .git/hooks via a config
> >    'hook.warnHookDir'. (Mentioning .git/hooks rather than core.hookDir is
> >    intentional here.) Alternatively, ask for 'hook.warn' with values "hookdir",
> >    "all" depending on how trusting the user feels.
> >    - If we want to phase out .git/hooks, we can make this notification opt-out
> >      instead of opt-in, someday.
> >    - between runHookDir and warnHookDir, users are able to phase out
> >      .git/hooks scripts at their own pace.
> >  - Abstract most of this via a 'git hooks' command.
> >    - 'git hooks add <hookname> [--<scope>] <path>' to append another hook;
> >      let the scope setting match 'git config'.
> >    - 'git hooks show [<hookname>]' to indicate which hooks will be run, either
> >      on a specific event or for all events, and in which order, as well as each
> >      hook's config origin
> >    - 'git hooks edit <hookname>' to modify the hook order, or choose not to run
> >      a hook locally
> >    - By managing it through a command, we can reorder the contents of config
> >      files if it makes sense to do so.
> >  - Add a hook library.
> >    - Optionally specify all hook events via enum, so we aren't string-typing
> >      calls to find_hook() anymore.
> >    - Resolve configs into a list of hooks to run by concatenating them together
> >      in config precedence order.
> >      - Also allow configs formatted like "-<path>" to remove an
> >        earlier-configured invocation of that hook, or "^<path>" to move the
> >        earlier-configured invocation to this point in the execution order
> >    - Move the find_hook() implementation to here, to account for the multihook
> >      ordering
> 
> I think this addresses most of the concerns that I had about ordering.
> It is still a little suboptimal that we're relying on the ordering of
> the config file, since it makes things equivalent to numbered files in
> .d directories hard.

Hm, I suppose I don't see why, if the final ordering is determined by
the .git/config (or future replacement for that). Can you explain what
you mean? I want to understand where you're coming from.

> 
> Possibly as an alternative to the ^ syntax, we could make the hook value
> be of the form "key program", where key is a sort key (e.g., a number)
> and program is the program to run.  We pick normal config file ordering
> if the keys are identical.  Then if the system config wants to have a
> hook that always runs at the end, it can do so easily.

Interesting. This way if you decide after you've set up all your configs
just so that you really want something to run at the end of the update
event, you can change one place, not n=number of Git repos. (I do still
want to be able to say "don't run that global hook in this project"
though.)

> 
> In addition, we should be sure that attempting to remove a hook which
> doesn't exist isn't an error, since a user might want to set their
> ~/.gitconfig file to always unset a global hook that may or may not
> exist.

I'd be comfortable with a warning when exiting 'git hook edit' mode and
silence when actually running the hook list.

> 
> We also need to address exit codes with multiple hooks and whether we
> fail fast or not.  My series may provide some valuable options here, or
> we may choose to go with a single, simpler approach.  Whatever we do, we
> should document the behavior clearly.

Agree. I'll take a look at your series this week to see how you solved
it; this feels like a place where it'd be easy to give too many knobs
that users would never pull.

I was going to write my gut feeling about what we should do, but my gut
came up with reasonable arguments for lots of approaches. So I'll read
what came first before I say anything more :)

> 
> Overall, though, I think this approach has a lot of potential and I feel
> positive about it.  Thanks for bringing this up again in a productive
> and helpful way.

Thanks for being open minded about it.

 - Emily

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-18 22:38   ` Emily Shaffer
@ 2019-11-19  0:51     ` brian m. carlson
  2019-11-23  1:19       ` Emily Shaffer
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2019-11-19  0:51 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jonathan Nieder, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 5778 bytes --]

On 2019-11-18 at 22:38:19, Emily Shaffer wrote:
> On Sat, Nov 16, 2019 at 05:45:01AM +0000, brian m. carlson wrote:
> > On 2019-11-16 at 01:11:25, Emily Shaffer wrote:
> > > Here's my suggestion.
> > > 
> > >  - Let configs handle the hook list and ordering, via 'hook.{hookname}' which
> > >    can be specified more than once.
> > >    - global config: hook.update = /path/to/firsthook
> > >      user config: hook.update = /path/to/secondhook
> > >      worktree config: hook.update = -/path/to/firsthook #eliminate the firsthook
> > >       call
> > >    - global config: hook.update = /path/to/firsthook
> > >      repo config: hook.update = /path/to/secondhook
> > >      repo config: hook.update = ^/path/to/firsthook #move firsthook execution
> > >        after secondhook for this project
> > 
> > I'd like to hear more about how we handle multiple hooks that are
> > repo-specific and don't live in the PATH.  This is a common situation
> > for existing tools that handle multiple hooks, and it's one I think we
> > should support.
> 
> I guess I'm confused about where PATH comes into play. Do you mean that
> the hook being run relies on PATH to be set appropriately? I had
> envisioned absolute paths in the config.

In past discussions, there's been an assumption that hooks in the config
will be found in PATH if they're not specified explicitly, and I assumed
(apparently incorrectly) that the same would be true here.

I do expect folks are going to want to use non-absolute paths, though.
If I'm invoking the git binary in a hook, I don't care whether it exists
in /usr/bin, /usr/local/bin, ~/bin, or somewhere else entirely.  That's
my shell's problem to figure out.

It's also common for folks to use something like "bundle exec" in a hook
to run a linter that's installed by the local package manager, and in
order to do that, you have to honor PATH to find the package manager's
binary.  That could be in a variety of places, and it could end up
changing dynamically in a session due to a tool like RVM.

> > Perhaps we add a "git hook execute" subcommand that executes scripts
> > from the hook directory.
> 
> Can you give an example of when you'd use it? I'm not understanding your
> concern and I think an example use case would help me see what you mean.

Sure.  Currently, if I have pre-push hook, it lives in
.git/hooks/pre-push.  Now, I want to have multiple hooks for that which
are specific to my repo.  Maybe I've stuffed them into
.git/hooks/pre-push.d/hook1 and .git/hooks/pre-push.d/hook2, since
that's where my previous hook management system put them, but I now want
to use those same hooks with the config style and drop the old tool.

I'd like to use "git hook execute pre-push.d/hook1" and "git hook
execute pre-push.d/hook2" to automatically find the right hooks and
invoke them.  Similarly, I could use "git hook execute pre-push" to
execute the old pre-push hook.

I suppose if we continue to keep the existing behavior of changing the
directory and we pass the config options to the shell, then we could
just write "$(git config core.hooksPath || echo
.git/hooks)/pre-push.d/hook1" instead, which, while ugly, gets the job
done.  Then we wouldn't need such a command.

> > I think this addresses most of the concerns that I had about ordering.
> > It is still a little suboptimal that we're relying on the ordering of
> > the config file, since it makes things equivalent to numbered files in
> > .d directories hard.
> 
> Hm, I suppose I don't see why, if the final ordering is determined by
> the .git/config (or future replacement for that). Can you explain what
> you mean? I want to understand where you're coming from.

One of the benefits to using numbered files in a .d directory is that
you can explicitly control ordering of operations.  For example, maybe I
have a per-repo pre-push hook that performs some checks and rejects a
push if something is off.  I also have a pre-push hook for Git LFS that
pushes the Git LFS objects to the remote server if Git LFS is in use.

In this case, I'd always want my sanity-check hook to run first, and so
I'd number it first.  This is fine if both are per-repo, but if the LFS
hook is global, then it's in the wrong order and my LFS objects are
pushed even though my sanity check failed.

> > Possibly as an alternative to the ^ syntax, we could make the hook value
> > be of the form "key program", where key is a sort key (e.g., a number)
> > and program is the program to run.  We pick normal config file ordering
> > if the keys are identical.  Then if the system config wants to have a
> > hook that always runs at the end, it can do so easily.
> 
> Interesting. This way if you decide after you've set up all your configs
> just so that you really want something to run at the end of the update
> event, you can change one place, not n=number of Git repos. (I do still
> want to be able to say "don't run that global hook in this project"
> though.)

Exactly.  A global or per-user commit-msg hook may want to see the final
message before approving or rejecting it, and that wouldn't be possible
without some sort of ordering.

I strongly agree that we should still allow removing higher-level hooks.

> > In addition, we should be sure that attempting to remove a hook which
> > doesn't exist isn't an error, since a user might want to set their
> > ~/.gitconfig file to always unset a global hook that may or may not
> > exist.
> 
> I'd be comfortable with a warning when exiting 'git hook edit' mode and
> silence when actually running the hook list.

Yeah, that's what I'm going for.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-19  0:51     ` brian m. carlson
@ 2019-11-23  1:19       ` Emily Shaffer
  2019-11-25  3:04         ` brian m. carlson
  0 siblings, 1 reply; 12+ messages in thread
From: Emily Shaffer @ 2019-11-23  1:19 UTC (permalink / raw)
  To: brian m. carlson, git, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

On Tue, Nov 19, 2019 at 12:51:36AM +0000, brian m. carlson wrote:
> On 2019-11-18 at 22:38:19, Emily Shaffer wrote:
> > On Sat, Nov 16, 2019 at 05:45:01AM +0000, brian m. carlson wrote:
> > > On 2019-11-16 at 01:11:25, Emily Shaffer wrote:
> > > > Here's my suggestion.
> > > > 
> > > >  - Let configs handle the hook list and ordering, via 'hook.{hookname}' which
> > > >    can be specified more than once.
> > > >    - global config: hook.update = /path/to/firsthook
> > > >      user config: hook.update = /path/to/secondhook
> > > >      worktree config: hook.update = -/path/to/firsthook #eliminate the firsthook
> > > >       call
> > > >    - global config: hook.update = /path/to/firsthook
> > > >      repo config: hook.update = /path/to/secondhook
> > > >      repo config: hook.update = ^/path/to/firsthook #move firsthook execution
> > > >        after secondhook for this project
> > > 
> > > I'd like to hear more about how we handle multiple hooks that are
> > > repo-specific and don't live in the PATH.  This is a common situation
> > > for existing tools that handle multiple hooks, and it's one I think we
> > > should support.
> > 
> > I guess I'm confused about where PATH comes into play. Do you mean that
> > the hook being run relies on PATH to be set appropriately? I had
> > envisioned absolute paths in the config.
> 
> In past discussions, there's been an assumption that hooks in the config
> will be found in PATH if they're not specified explicitly, and I assumed
> (apparently incorrectly) that the same would be true here.

Ah, I think I see what you mean.

hook.update = security-heuristic-runner

where "security-heuristic-runner" is some compiled binary your employer
purchased from some vendor and distributed directly to your `/bin/`.

No, I had imagined users would achieve that by writing:

hook.update = /bin/security-heuristic-runner


> I do expect folks are going to want to use non-absolute paths, though.
> If I'm invoking the git binary in a hook, I don't care whether it exists
> in /usr/bin, /usr/local/bin, ~/bin, or somewhere else entirely.  That's
> my shell's problem to figure out.

Hm. Do you mean:

hook.update = git grep "something"

Or do you mean:

hook.update = ~/grephook.sh

grephook.sh:
  #!/bin/bash

  git grep "something" >output
  ... do something with output ...

I suppose I need to understand better how $PATH works with the latter
scenario, but my gut says "if you didn't worry about where to find the
Git binary from your script before, why are you going to start caring
now".

This led me to wonder: "Should we allow someone to pass arguments via
the hook config?" Or to put it another way, "Should we allow 'hook.update
= grep -Rin blahblah >audit.log'?" I think the answer is no - some hooks
do expect to be given arguments, for example
sequencer.c:run_prepare_commit_msg_hook().

> 
> It's also common for folks to use something like "bundle exec" in a hook
> to run a linter that's installed by the local package manager, and in
> order to do that, you have to honor PATH to find the package manager's
> binary.  That could be in a variety of places, and it could end up
> changing dynamically in a session due to a tool like RVM.

I imagine I'm missing something crucial here about why this isn't an
issue in today's hook implementation, but it would be an issue with a
config-based hook lookup. I don't think I see why the invocation would
be any different, except today find_hook() says "is it in
$GITDIR/hooks/" and tomorrow find_hook() would say "is it in config".

> 
> > > Perhaps we add a "git hook execute" subcommand that executes scripts
> > > from the hook directory.
> > 
> > Can you give an example of when you'd use it? I'm not understanding your
> > concern and I think an example use case would help me see what you mean.
> 
> Sure.  Currently, if I have pre-push hook, it lives in
> .git/hooks/pre-push.  Now, I want to have multiple hooks for that which
> are specific to my repo.  Maybe I've stuffed them into
> .git/hooks/pre-push.d/hook1 and .git/hooks/pre-push.d/hook2, since
> that's where my previous hook management system put them, but I now want
> to use those same hooks with the config style and drop the old tool.
> 
> I'd like to use "git hook execute pre-push.d/hook1" and "git hook
> execute pre-push.d/hook2" to automatically find the right hooks and
> invoke them.  Similarly, I could use "git hook execute pre-push" to
> execute the old pre-push hook.

Hmmm. I think you're describing a scenario like this:

1. I make .git/hooks/pre-push.d/hook1, .git/hooks/pre-push.d/hook2, and
   .git/hooks/pre-push.
2. Hook magic happens upstream, and now instead of living in
   .git/hooks/, my hooks live in ~/.githooks/<reponame>/.
3. I just want to run my hooks now, but where are they?

I don't envision a Git-facilitated "mv .git/hooks/
~/.githooks/<reponame>/" (or whatever). In fact, I think it could break
your scenario, because maybe you write "pre-push" to reference hook1 and
hook2 by absolute path, or by relative path to the repo directory. I
don't like the idea of doing that very much at all, which is why I
proposed hook.warnHookDir (or whatever it is I called it), which we can
later turn on by default if/when we want to chase people off using
.git/hooks/.

To your point, though. The most hands-off way to keep your previous
setup would be to add:

  hook.pre-push = /path/to/myrepo/.git/hooks/pre-push

which would invoke your springboard script and let you do whatever you
want with the contents of .git/hooks/pre-push.d/. The second most
hands-off (but more correct) way would be to add:

  hook.pre-push = /path/to/myrepo/.git/hooks/pre-push.d/hook1
  hook.pre-push = /path/to/myrepo/.git/hooks/pre-push.d/hook2

I think we may be talking past each other, because when I hear you
describe what you want from 'git hook execute <...>', it sounds like you
are asking for 'git hook' to just do:

  $ $(git config core.hookdir)/$3

and that confuses me because I'm having trouble figuring out why you
couldn't just do that yourself. So I'm sure I'm still not understanding
something correctly.

This is kind of the use case I was hoping to account for by providing
'hook.runHookDir' - hooks in .git/hooks/ still "just work", by taking
the contents of that dir into account while composing the list of hooks
to run.

> 
> I suppose if we continue to keep the existing behavior of changing the
> directory and we pass the config options to the shell, then we could
> just write "$(git config core.hooksPath || echo
> .git/hooks)/pre-push.d/hook1" instead, which, while ugly, gets the job
> done.  Then we wouldn't need such a command.

Yeah, I am wondering about when you want to run a hook generically (i.e.
from a noninteractive script) but outside of the context of something in
the Git binary invoking a hook. Are you thinking of Git commands
implemented as scripts?

I wonder if the script use case is better served by something like 'git
hook list pre-push --porcelain' which could compose the full list of
hooks to run, taking into account hook.runHookDir and provide them in a
script-readable way. Or, a simplification on your suggestion: 'git hook
execute pre-push" which does the same thing, but runs them all for you
too.

> > > I think this addresses most of the concerns that I had about ordering.
> > > It is still a little suboptimal that we're relying on the ordering of
> > > the config file, since it makes things equivalent to numbered files in
> > > .d directories hard.
> > 
> > Hm, I suppose I don't see why, if the final ordering is determined by
> > the .git/config (or future replacement for that). Can you explain what
> > you mean? I want to understand where you're coming from.
> 
> One of the benefits to using numbered files in a .d directory is that
> you can explicitly control ordering of operations.  For example, maybe I
> have a per-repo pre-push hook that performs some checks and rejects a
> push if something is off.  I also have a pre-push hook for Git LFS that
> pushes the Git LFS objects to the remote server if Git LFS is in use.
> 
> In this case, I'd always want my sanity-check hook to run first, and so
> I'd number it first.  This is fine if both are per-repo, but if the LFS
> hook is global, then it's in the wrong order and my LFS objects are
> pushed even though my sanity check failed.

Yeah, this is really compelling, and also removes the somewhat wonky ^
proposed just below here. I like this idea quite a lot:

hook.pre-push = 001:/path/to/sanity-checker

I'll have to ponder on the UX of a 'git hook'-facilitated interactive
edit of the hook numbering, though. UX is not my strong point :)

> 
> > > Possibly as an alternative to the ^ syntax, we could make the hook value
> > > be of the form "key program", where key is a sort key (e.g., a number)
> > > and program is the program to run.  We pick normal config file ordering
> > > if the keys are identical.  Then if the system config wants to have a
> > > hook that always runs at the end, it can do so easily.
> > 
> > Interesting. This way if you decide after you've set up all your configs
> > just so that you really want something to run at the end of the update
> > event, you can change one place, not n=number of Git repos. (I do still
> > want to be able to say "don't run that global hook in this project"
> > though.)
> 
> Exactly.  A global or per-user commit-msg hook may want to see the final
> message before approving or rejecting it, and that wouldn't be possible
> without some sort of ordering.
> 
> I strongly agree that we should still allow removing higher-level hooks.
> 
> > > In addition, we should be sure that attempting to remove a hook which
> > > doesn't exist isn't an error, since a user might want to set their
> > > ~/.gitconfig file to always unset a global hook that may or may not
> > > exist.
> > 
> > I'd be comfortable with a warning when exiting 'git hook edit' mode and
> > silence when actually running the hook list.
> 
> Yeah, that's what I'm going for.
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204

Thanks for being patient as I wrapped my head around this enough to
reply.

 - Emily

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-23  1:19       ` Emily Shaffer
@ 2019-11-25  3:04         ` brian m. carlson
  2019-11-25 22:21           ` Emily Shaffer
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2019-11-25  3:04 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jonathan Nieder, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 4834 bytes --]

On 2019-11-23 at 01:19:24, Emily Shaffer wrote:
> Ah, I think I see what you mean.
> 
> hook.update = security-heuristic-runner
> 
> where "security-heuristic-runner" is some compiled binary your employer
> purchased from some vendor and distributed directly to your `/bin/`.
> 
> No, I had imagined users would achieve that by writing:
> 
> hook.update = /bin/security-heuristic-runner

Yeah, that's what I'm looking for.  The problem is that, for example,
Debian does not guarantee where in PATH a file is.  Having a newer Git
on RHEL or CentOS systems often involves hacking PATH and
LD_LIBRARY_PATH.

> Hm. Do you mean:
> 
> hook.update = git grep "something"
> 
> Or do you mean:
> 
> hook.update = ~/grephook.sh
> 
> grephook.sh:
>   #!/bin/bash
> 
>   git grep "something" >output
>   ... do something with output ...

I had intended to include the latter case, but also allow valid hooks
with multiple argument support.  For example, you could invoke "git lfs
pre-push" directly in your hook, and that is a fully functioning
pre-push hook, and would require a suitable PATH lookup to find your Git
binary.  It accepts the additional arguments that pre-push hooks accept;
right now we basically do the following instead:

----
#!/bin/sh

exec git lfs pre-push "$@"
----

> I suppose I need to understand better how $PATH works with the latter
> scenario, but my gut says "if you didn't worry about where to find the
> Git binary from your script before, why are you going to start caring
> now".
> 
> This led me to wonder: "Should we allow someone to pass arguments via
> the hook config?" Or to put it another way, "Should we allow 'hook.update
> = grep -Rin blahblah >audit.log'?" I think the answer is no - some hooks
> do expect to be given arguments, for example
> sequencer.c:run_prepare_commit_msg_hook().

I think what we want to do in this case is just invoke things in the
shell with extra arguments, like we do with editors.  This means we
don't have to handle PATH or anything else; we just invoke the shell and
let it handle it.  That lets people provide multi-call binaries (like
git lfs) that include hooks inside them.

I do, however, think we should require folks to have a suitable hook
that accepts the right arguments.  So "git grep blahblah" isn't a valid
hook in most cases, because it doesn't take the right arguments and read
the right data from stdin if necessary.

> > I suppose if we continue to keep the existing behavior of changing the
> > directory and we pass the config options to the shell, then we could
> > just write "$(git config core.hooksPath || echo
> > .git/hooks)/pre-push.d/hook1" instead, which, while ugly, gets the job
> > done.  Then we wouldn't need such a command.
> 
> Yeah, I am wondering about when you want to run a hook generically (i.e.
> from a noninteractive script) but outside of the context of something in
> the Git binary invoking a hook. Are you thinking of Git commands
> implemented as scripts?

I'm just thinking about existing hook wrappers that invoke multiple
scripts at the moment, something like how
https://gist.github.com/mjackson/7e602a7aa357cfe37dadcc016710931b works
at the moment and how we'd replace that with a config-based model.

I think using the shell avoids the entire proposal, because it then
becomes trivial to script that in the command and port it over, since we
can use my ugly hack above.  I think I like that better than "git hook
execute" because it's a little more flexible.

> > One of the benefits to using numbered files in a .d directory is that
> > you can explicitly control ordering of operations.  For example, maybe I
> > have a per-repo pre-push hook that performs some checks and rejects a
> > push if something is off.  I also have a pre-push hook for Git LFS that
> > pushes the Git LFS objects to the remote server if Git LFS is in use.
> > 
> > In this case, I'd always want my sanity-check hook to run first, and so
> > I'd number it first.  This is fine if both are per-repo, but if the LFS
> > hook is global, then it's in the wrong order and my LFS objects are
> > pushed even though my sanity check failed.
> 
> Yeah, this is really compelling, and also removes the somewhat wonky ^
> proposed just below here. I like this idea quite a lot:
> 
> hook.pre-push = 001:/path/to/sanity-checker

I think a colon is actually better than my proposal for a space in this
regard, but I'm not picky: anything unambiguous is fine.

> I'll have to ponder on the UX of a 'git hook'-facilitated interactive
> edit of the hook numbering, though. UX is not my strong point :)

I also find I'm not great at UX, so I can't be of much help.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-25  3:04         ` brian m. carlson
@ 2019-11-25 22:21           ` Emily Shaffer
  2019-11-25 22:45             ` Emily Shaffer
  2019-11-26  0:28             ` brian m. carlson
  0 siblings, 2 replies; 12+ messages in thread
From: Emily Shaffer @ 2019-11-25 22:21 UTC (permalink / raw)
  To: brian m. carlson, git, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

On Mon, Nov 25, 2019 at 03:04:45AM +0000, brian m. carlson wrote:
> On 2019-11-23 at 01:19:24, Emily Shaffer wrote:
> > Ah, I think I see what you mean.
> > 
> > hook.update = security-heuristic-runner
> > 
> > where "security-heuristic-runner" is some compiled binary your employer
> > purchased from some vendor and distributed directly to your `/bin/`.
> > 
> > No, I had imagined users would achieve that by writing:
> > 
> > hook.update = /bin/security-heuristic-runner
> 
> Yeah, that's what I'm looking for.  The problem is that, for example,
> Debian does not guarantee where in PATH a file is.  Having a newer Git
> on RHEL or CentOS systems often involves hacking PATH and
> LD_LIBRARY_PATH.
> 
> > Hm. Do you mean:
> > 
> > hook.update = git grep "something"
> > 
> > Or do you mean:
> > 
> > hook.update = ~/grephook.sh
> > 
> > grephook.sh:
> >   #!/bin/bash
> > 
> >   git grep "something" >output
> >   ... do something with output ...
> 
> I had intended to include the latter case, but also allow valid hooks
> with multiple argument support.  For example, you could invoke "git lfs
> pre-push" directly in your hook, and that is a fully functioning
> pre-push hook, and would require a suitable PATH lookup to find your Git
> binary.  It accepts the additional arguments that pre-push hooks accept;
> right now we basically do the following instead:
> 
> ----
> #!/bin/sh
> 
> exec git lfs pre-push "$@"
> ----
> 
> > I suppose I need to understand better how $PATH works with the latter
> > scenario, but my gut says "if you didn't worry about where to find the
> > Git binary from your script before, why are you going to start caring
> > now".
> > 
> > This led me to wonder: "Should we allow someone to pass arguments via
> > the hook config?" Or to put it another way, "Should we allow 'hook.update
> > = grep -Rin blahblah >audit.log'?" I think the answer is no - some hooks
> > do expect to be given arguments, for example
> > sequencer.c:run_prepare_commit_msg_hook().
> 
> I think what we want to do in this case is just invoke things in the
> shell with extra arguments, like we do with editors.  This means we
> don't have to handle PATH or anything else; we just invoke the shell and
> let it handle it.  That lets people provide multi-call binaries (like
> git lfs) that include hooks inside them.

I think that you are saying we should do the nice equivalent of:

  system("git lfs pre-push");

and tack the args onto the end. (I suppose that's run-command.h, but I'm
trying to use a shorthand that is easy to understand.)

It seems like this would solve the PATH issue, yes. However, I feel
tentative because of pushback on that approach in the bugreport review.
Hmmm. I think this is different, because the user understands that the
hook they configure themselves will be invoked on the shell of their
choosing. That is, I think run-command.h with "C:\myhook.exe" would
still work, no?

Will this approach "just work" for Windows, et al.?

> I do, however, think we should require folks to have a suitable hook
> that accepts the right arguments.  So "git grep blahblah" isn't a valid
> hook in most cases, because it doesn't take the right arguments and read
> the right data from stdin if necessary.

I'm not sure how we would guarantee that. Are you suggesting we should
try dry running a hook somehow when it's added with 'git hook add'? Even
doing that won't stop someone from popping open ~/.gitconfig with nano
and adding their hook that doesn't take the right args that way.

> 
> > > I suppose if we continue to keep the existing behavior of changing the
> > > directory and we pass the config options to the shell, then we could
> > > just write "$(git config core.hooksPath || echo
> > > .git/hooks)/pre-push.d/hook1" instead, which, while ugly, gets the job
> > > done.  Then we wouldn't need such a command.
> > 
> > Yeah, I am wondering about when you want to run a hook generically (i.e.
> > from a noninteractive script) but outside of the context of something in
> > the Git binary invoking a hook. Are you thinking of Git commands
> > implemented as scripts?
> 
> I'm just thinking about existing hook wrappers that invoke multiple
> scripts at the moment, something like how
> https://gist.github.com/mjackson/7e602a7aa357cfe37dadcc016710931b works
> at the moment and how we'd replace that with a config-based model.
> 
> I think using the shell avoids the entire proposal, because it then
> becomes trivial to script that in the command and port it over, since we
> can use my ugly hack above.

Still not sure I understand what the issue was before. Is it that the
trampoline script needs to know $PATH to be able to invoke the child
hooks in hookname.d? Or it's because the current working directory
isn't clear, so a relative path in the trampoline script may not be
resolved well?

> I think I like that better than "git hook
> execute" because it's a little more flexible.
> 
> > > One of the benefits to using numbered files in a .d directory is that
> > > you can explicitly control ordering of operations.  For example, maybe I
> > > have a per-repo pre-push hook that performs some checks and rejects a
> > > push if something is off.  I also have a pre-push hook for Git LFS that
> > > pushes the Git LFS objects to the remote server if Git LFS is in use.
> > > 
> > > In this case, I'd always want my sanity-check hook to run first, and so
> > > I'd number it first.  This is fine if both are per-repo, but if the LFS
> > > hook is global, then it's in the wrong order and my LFS objects are
> > > pushed even though my sanity check failed.
> > 
> > Yeah, this is really compelling, and also removes the somewhat wonky ^
> > proposed just below here. I like this idea quite a lot:
> > 
> > hook.pre-push = 001:/path/to/sanity-checker
> 
> I think a colon is actually better than my proposal for a space in this
> regard, but I'm not picky: anything unambiguous is fine.

Yeah, same. I'm a little worried about a colon because there was some
conversation about being able to pick a hook in a repo from a specific
ref (i.e. I want to run 'update' hook from 'master' all the time even
when I'm examining 'third-party-topic' branch or bisecting
'nasty-bug-branch') and the colon seems to be used handily for ref +
path, but I think these details will work themselves out during
implementation, so I'm not so very worried.
> 
> > I'll have to ponder on the UX of a 'git hook'-facilitated interactive
> > edit of the hook numbering, though. UX is not my strong point :)
> 
> I also find I'm not great at UX, so I can't be of much help.

We're in luck on my end as it seems we have a UX team with open office
hours who can give us an opinion. I'll try to meet with them next week
Thursday; thanks Jonathan Nieder for the pointer.


This sounds like we both are pretty close on the same page, so I think I
will get started in the coming weeks and see if we can get a mockup to
pick at with the implementation details in front of us.

 - Emily

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-25 22:21           ` Emily Shaffer
@ 2019-11-25 22:45             ` Emily Shaffer
  2019-11-26  0:28             ` brian m. carlson
  1 sibling, 0 replies; 12+ messages in thread
From: Emily Shaffer @ 2019-11-25 22:45 UTC (permalink / raw)
  To: brian m. carlson, git, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

> This sounds like we both are pretty close on the same page, so I think I
> will get started in the coming weeks and see if we can get a mockup to
> pick at with the implementation details in front of us.

Hm. To elaborate (and partially as a reminder to myself) I will try to
get it done in the following order:

1. Implement 'git hook list <hookname>' which reads all the configs.
(User would need to manually add the configs at this stage)

2. (maybe) Implement 'git hook execute <hookname> <arg...>'. This may or
may not be useful; I suppose it would be pretty equivalent to:

 $ git hook list <hookname> | xargs -I% sh % <arg...>

3. Implement config modifiers like 'git hook add', 'git hook edit' etc.

My thinking is that we will have a lot of time with 1. in front of us to
nitpick how we want the config format to look, how the ordering should
go, etc. and it will be a fairly simple implementation. It'll also be
"usable" although not in a particularly friendly way in case someone
wants to try it and see, in a way that the config modifiers by
themselves wouldn't be.

 - Emily

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-25 22:21           ` Emily Shaffer
  2019-11-25 22:45             ` Emily Shaffer
@ 2019-11-26  0:28             ` brian m. carlson
  2019-11-26  0:56               ` Emily Shaffer
  1 sibling, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2019-11-26  0:28 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jonathan Nieder, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 3825 bytes --]

On 2019-11-25 at 22:21:13, Emily Shaffer wrote:
> I think that you are saying we should do the nice equivalent of:
> 
>   system("git lfs pre-push");
> 
> and tack the args onto the end. (I suppose that's run-command.h, but I'm
> trying to use a shorthand that is easy to understand.)

Yeah, I'm proposing we run the command using run-command.c with the
use_shell flag, which does something very much like that, except a
little bit saner.

> It seems like this would solve the PATH issue, yes. However, I feel
> tentative because of pushback on that approach in the bugreport review.
> Hmmm. I think this is different, because the user understands that the
> hook they configure themselves will be invoked on the shell of their
> choosing. That is, I think run-command.h with "C:\myhook.exe" would
> still work, no?

This will always use /bin/sh, as we do for editors.

bash on Windows does understand the Windows paths and works correctly in
this case, AIUI, so that should be fine.

> Will this approach "just work" for Windows, et al.?

Yes.  Windows ships with bash (or in Portable Git, busybox sh), which is
not only required to run the testsuite, but required to invoke editors.

> > I do, however, think we should require folks to have a suitable hook
> > that accepts the right arguments.  So "git grep blahblah" isn't a valid
> > hook in most cases, because it doesn't take the right arguments and read
> > the right data from stdin if necessary.
> 
> I'm not sure how we would guarantee that. Are you suggesting we should
> try dry running a hook somehow when it's added with 'git hook add'? Even
> doing that won't stop someone from popping open ~/.gitconfig with nano
> and adding their hook that doesn't take the right args that way.

We don't need to guarantee that.  We just need to document it, so that
(a) people write hooks in the expected way and (b) if people don't, we
can point them to the documentation and explain why their hooks don't
work.  I can see people thinking of this as a set of commands that just
checks exit statuses, and there's likely to be confusion.

> > I'm just thinking about existing hook wrappers that invoke multiple
> > scripts at the moment, something like how
> > https://gist.github.com/mjackson/7e602a7aa357cfe37dadcc016710931b works
> > at the moment and how we'd replace that with a config-based model.
> > 
> > I think using the shell avoids the entire proposal, because it then
> > becomes trivial to script that in the command and port it over, since we
> > can use my ugly hack above.
> 
> Still not sure I understand what the issue was before. Is it that the
> trampoline script needs to know $PATH to be able to invoke the child
> hooks in hookname.d? Or it's because the current working directory
> isn't clear, so a relative path in the trampoline script may not be
> resolved well?

I think we need to have either (a) a way to explicitly invoke hooks
underneath the hook directory or (b) a shell invocation to allow looking
up the hook directory.  People want to have per-repository hooks that
are not a part of the project, and they need a place to store them,
which is logically the hook directory.

If we want to allow people to have multiple hooks under something like
.git/hooks/pre-push.d, then we need to have a way to wire them up in the
configuration using the correct location of the hook directory instead
of using a helper script like the one I linked above.

We don't know that the hook directory will necessarily be under
.git/hooks, so if the user has moved it elsewhere, we'd want to follow
that.  A relative path would be wrong if the user changed the hook
directory to a different location.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-26  0:28             ` brian m. carlson
@ 2019-11-26  0:56               ` Emily Shaffer
  2019-11-26  2:41                 ` brian m. carlson
  0 siblings, 1 reply; 12+ messages in thread
From: Emily Shaffer @ 2019-11-26  0:56 UTC (permalink / raw)
  To: brian m. carlson, git, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

On Tue, Nov 26, 2019 at 12:28:06AM +0000, brian m. carlson wrote:
> On 2019-11-25 at 22:21:13, Emily Shaffer wrote:
> > I think that you are saying we should do the nice equivalent of:
> > 
> >   system("git lfs pre-push");
> > 
> > and tack the args onto the end. (I suppose that's run-command.h, but I'm
> > trying to use a shorthand that is easy to understand.)
> 
> Yeah, I'm proposing we run the command using run-command.c with the
> use_shell flag, which does something very much like that, except a
> little bit saner.
> 
> > It seems like this would solve the PATH issue, yes. However, I feel
> > tentative because of pushback on that approach in the bugreport review.
> > Hmmm. I think this is different, because the user understands that the
> > hook they configure themselves will be invoked on the shell of their
> > choosing. That is, I think run-command.h with "C:\myhook.exe" would
> > still work, no?
> 
> This will always use /bin/sh, as we do for editors.
> 
> bash on Windows does understand the Windows paths and works correctly in
> this case, AIUI, so that should be fine.
> 
> > Will this approach "just work" for Windows, et al.?
> 
> Yes.  Windows ships with bash (or in Portable Git, busybox sh), which is
> not only required to run the testsuite, but required to invoke editors.
> 
> > > I do, however, think we should require folks to have a suitable hook
> > > that accepts the right arguments.  So "git grep blahblah" isn't a valid
> > > hook in most cases, because it doesn't take the right arguments and read
> > > the right data from stdin if necessary.
> > 
> > I'm not sure how we would guarantee that. Are you suggesting we should
> > try dry running a hook somehow when it's added with 'git hook add'? Even
> > doing that won't stop someone from popping open ~/.gitconfig with nano
> > and adding their hook that doesn't take the right args that way.
> 
> We don't need to guarantee that.  We just need to document it, so that
> (a) people write hooks in the expected way and (b) if people don't, we
> can point them to the documentation and explain why their hooks don't
> work.  I can see people thinking of this as a set of commands that just
> checks exit statuses, and there's likely to be confusion.
> 
> > > I'm just thinking about existing hook wrappers that invoke multiple
> > > scripts at the moment, something like how
> > > https://gist.github.com/mjackson/7e602a7aa357cfe37dadcc016710931b works
> > > at the moment and how we'd replace that with a config-based model.
> > > 
> > > I think using the shell avoids the entire proposal, because it then
> > > becomes trivial to script that in the command and port it over, since we
> > > can use my ugly hack above.
> > 
> > Still not sure I understand what the issue was before. Is it that the
> > trampoline script needs to know $PATH to be able to invoke the child
> > hooks in hookname.d? Or it's because the current working directory
> > isn't clear, so a relative path in the trampoline script may not be
> > resolved well?
> 
> I think we need to have either (a) a way to explicitly invoke hooks
> underneath the hook directory or (b) a shell invocation to allow looking
> up the hook directory.  People want to have per-repository hooks that
> are not a part of the project, and they need a place to store them,
> which is logically the hook directory.
> 
> If we want to allow people to have multiple hooks under something like
> .git/hooks/pre-push.d, then we need to have a way to wire them up in the
> configuration using the correct location of the hook directory instead
> of using a helper script like the one I linked above.
> 
> We don't know that the hook directory will necessarily be under
> .git/hooks, so if the user has moved it elsewhere, we'd want to follow
> that.  A relative path would be wrong if the user changed the hook
> directory to a different location.

Hopefully I am not beating a dead horse here but I really want to
understand. Let me take another guess with examples at what you mean;
please correct me!

For our purposes, let's assume:

.git/hooks/
  update
  update.d/
    update-1
    update-2

update:
  #!/bin/bash

  ./git/hooks/update.d/update-1 &&
  ./git/hooks/update.d/update-2

The goal is to make sure update-1 and update-2 are run when other update
hooks happen.

With my proposal as is, I see two options:

1)
.git/config:
  hook.runHookDir = true

This will run whatever else is in hook.update, and then it will run
.git/hooks/update. This is not ideal because hookDir could change, and
then the content of update will be incorrect:

  git config --add core.hookdir=~/hooks/
  mv .git/hooks/update* ~/hooks/
  # call something which invokes update hooks
  # ~/hooks/update fails, .git/hooks/update-1 is gone :(

2)
.git/config:
  hook.update = 001:/project/path/.git/hooks/update.d/update-1
  hook.update = 002:/project/path/.git/hooks/update.d/update-2

This will run each update hook individually and have no notion of
whether they're in a "hook dir" or not. It sees a path, it hands it to
'sh' to run, it checks the return code.

Now I run 'mv .git/hooks/update* ~/hooks/'. Next time I invoke something
which would run the 'update' hook, it fails, because the path in the
config isn't pointing to anything. But this way is unrelated to hookdir.

It sounds like you might be asking for something like:

.git/config:
  hook.update = 001:__HOOKDIR__/update.d/update-1

I'm not sure that I like the idea of this. My own dream is to eliminate
the need for a hookdir entirely, so it's logically easy for my Gerrit
hook to live in ~/gerrit-tools/ and my linter to live in ~/clang-tidy/
and so on.

I could see option 1 being alleviated by writing it as something like:

update:
  $GIT_HOOKDIR/update.d/update-1 &&
  $GIT_HOOKDIR/update.d/update-2

or
update:
  $(git config core.hookdir)/update.d/update-1 &&
  $(git config core.hookdir)/update.d/update-2

But, I think once you "buy in" to the idea of providing a full path to
the final target - not to a trampoline script - in your config, you
should forget about the idea of "core.hookdir" having anything to do
with it.

To quote you out-of-order now:

> If we want to allow people to have multiple hooks under something like
> .git/hooks/pre-push.d, then we need to have a way to wire them up in the
> configuration using the correct location of the hook directory instead
> of using a helper script like the one I linked above.

I think I may spot the misunderstanding. It sounds like you hope someone
can provide "hook.update=001:~/update.d/" and have all the contents of
update.d executed. I'll be clear and say that I didn't have the
intention to support that at all; instead I was hoping to support a case
like 2. above. Recursing through directories like that sounds scary to
order.

Hmm. Maybe it makes slightly more sense to support something like 'git
hook add update ~/update.d', which provides an editor where you can add
ordering numbers and config scope to each, but resolves to a config
which looks like option 2. above.

Are you pushing hard for this update.d case in the hopes that someone
can 'set and forget' a hook directory like that, and be able to add new
hooks there without changing the configuration? It sounds tricky to me,
as we then get to deal with more questions like:

 - how do we order hooks in that directory?
 - how should 'git hook list' display those hooks?
 - how would you use 'git hook' to reorder those hooks?
 - do we really want a user to 'git pull' and be running an entirely new script
   they didn't mean to, without any additional interaction? (and maybe
   we do - 'git pull' in a repo where you have asked to run hooks from,
   and then hook behavior changes, might not actually be surprising.)

 - Emily

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-26  0:56               ` Emily Shaffer
@ 2019-11-26  2:41                 ` brian m. carlson
  2019-12-02 23:46                   ` Emily Shaffer
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2019-11-26  2:41 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jonathan Nieder, Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 7138 bytes --]

On 2019-11-26 at 00:56:14, Emily Shaffer wrote:
> Hopefully I am not beating a dead horse here but I really want to
> understand. Let me take another guess with examples at what you mean;
> please correct me!
> 
> For our purposes, let's assume:
> 
> .git/hooks/
>   update
>   update.d/
>     update-1
>     update-2
> 
> update:
>   #!/bin/bash
> 
>   ./git/hooks/update.d/update-1 &&
>   ./git/hooks/update.d/update-2
> 
> The goal is to make sure update-1 and update-2 are run when other update
> hooks happen.



> With my proposal as is, I see two options:
> 
> 1)
> .git/config:
>   hook.runHookDir = true
> 
> This will run whatever else is in hook.update, and then it will run
> .git/hooks/update. This is not ideal because hookDir could change, and
> then the content of update will be incorrect:
> 
>   git config --add core.hookdir=~/hooks/
>   mv .git/hooks/update* ~/hooks/
>   # call something which invokes update hooks
>   # ~/hooks/update fails, .git/hooks/update-1 is gone :(
> 

This is actually fine.  We assume the user knows where they want to
store their hooks.  If they change that, then that's intentional.

> .git/config:
>   hook.update = 001:/project/path/.git/hooks/update.d/update-1
>   hook.update = 002:/project/path/.git/hooks/update.d/update-2
> 
> This will run each update hook individually and have no notion of
> whether they're in a "hook dir" or not. It sees a path, it hands it to
> 'sh' to run, it checks the return code.

Correct.  This is the logical porting of the above shell script to the
config syntax if you use an absolute path.

> Now I run 'mv .git/hooks/update* ~/hooks/'. Next time I invoke something
> which would run the 'update' hook, it fails, because the path in the
> config isn't pointing to anything. But this way is unrelated to hookdir.

Yes, that's correct.

It sounds like you're thinking of the config approach as completely
orthogonal to the hook directory.  However, I want to have multiple
per-repository hooks that do not live within the repository but move
with it.  The logical place to store those hooks is in the hook
directory, even if it's not being invoked by Git explicitly.  To use
that with the config approach so I can have multiple hooks in a useful
way that's compatible with other tools, I need some way to refer to
whatever the hook directory is at a given point in time.

> It sounds like you might be asking for something like:
> 
> .git/config:
>   hook.update = 001:__HOOKDIR__/update.d/update-1
> 
> I'm not sure that I like the idea of this. My own dream is to eliminate
> the need for a hookdir entirely, so it's logically easy for my Gerrit
> hook to live in ~/gerrit-tools/ and my linter to live in ~/clang-tidy/
> and so on.

Using the config syntax eliminates per-repository storage for hooks.  I,
personally, want to store multiple one-off hooks in my hooks directory.
I want to use tools that install one-off hooks into my repository
without needing to depend on the location of those tools in the future.
I don't want those hooks to live elsewhere on my file system, since that
makes my repository no longer self contained.

I want to store those hooks in the hook directory, wherever I've
configured that, and whatever that happens to be at this point in time.

I may additionally have tools that live in other locations as well and
may use other syntaxes to invoke them.  For example, I may install a
hook that's provided by a Debian package and refer to it using a bare
program name.

If your goal is to eliminate the hook directory entirely, then I can't
say that I'm in support of that.  A design which does that won't meet my
needs, and it likely won't meet the needs of other users.

The benefit of your proposed config syntax for me is that it provides a
standard way to configure multiple hooks.  I still want many of those
hooks to live in the hook directory, although others may live elsewhere.

> I could see option 1 being alleviated by writing it as something like:
> 
> update:
>   $GIT_HOOKDIR/update.d/update-1 &&
>   $GIT_HOOKDIR/update.d/update-2
> 
> or
> update:
>   $(git config core.hookdir)/update.d/update-1 &&
>   $(git config core.hookdir)/update.d/update-2

This is similar to what I want, and why I want to pass it to the shell.
I can write the following, and everything just works:

.git/config:
  hook.update = 001:$(git config core.hookdir || echo .git/hooks)/update.d/update-1
  hook.update = 002:$(git config core.hookdir || echo .git/hooks)/update.d/update-2

Wherever I have placed my hooks for this repository, I can refer to
them with a shell script.  I can also add the following line as well:

.git/config:
  hook.update = debian-package-hook update

…and everything just works.

> But, I think once you "buy in" to the idea of providing a full path to
> the final target - not to a trampoline script - in your config, you
> should forget about the idea of "core.hookdir" having anything to do
> with it.

I have no intention of providing a full path to any hook, since that's
quite brittle.  There are very few paths on my system which can be
guaranteed not to change, and all of them are things like /bin/sh or
/usr/bin/env.  If my hooks are in the hook directory (or even in the
working tree) with a full path and I move that repository, they become
broken.  If they're shipped by Debian and Debian decides to move the
command, they're broken.

I'm very interested in learning more about why you seem to want to
specify full paths.  It seems very at odds with the way the rest of Git
works.  If the goal is to support other syntaxes in the future, then
let's use a prefix character (e.g. !) to denote commands vs.
non-commands or something like that.  If the goal is security, then I'd
like to hear more about the security model you're trying to achieve with
this design.

> To quote you out-of-order now:
> 
> > If we want to allow people to have multiple hooks under something like
> > .git/hooks/pre-push.d, then we need to have a way to wire them up in the
> > configuration using the correct location of the hook directory instead
> > of using a helper script like the one I linked above.
> 
> I think I may spot the misunderstanding. It sounds like you hope someone
> can provide "hook.update=001:~/update.d/" and have all the contents of
> update.d executed. I'll be clear and say that I didn't have the
> intention to support that at all; instead I was hoping to support a case
> like 2. above. Recursing through directories like that sounds scary to
> order.

I don't need a particular way to do that, since I can do it already, but
I do need a way to wire up multiple hooks that are per-repository and
move with the repository but aren't in the repository.  In other words,
I need a functional replacement for that approach if we're not going to
use that approach itself.

Hopefully I've done a better job at explaining myself here.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hook management via 'git hooks' command
  2019-11-26  2:41                 ` brian m. carlson
@ 2019-12-02 23:46                   ` Emily Shaffer
  0 siblings, 0 replies; 12+ messages in thread
From: Emily Shaffer @ 2019-12-02 23:46 UTC (permalink / raw)
  To: brian m. carlson, git, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

Thanks for waiting as I took half of last week off to host family for
holiday. :)

On Tue, Nov 26, 2019 at 02:41:47AM +0000, brian m. carlson wrote:
> On 2019-11-26 at 00:56:14, Emily Shaffer wrote:
> > Hopefully I am not beating a dead horse here but I really want to
> > understand. Let me take another guess with examples at what you mean;
> > please correct me!
> > 
> > For our purposes, let's assume:
> > 
> > .git/hooks/
> >   update
> >   update.d/
> >     update-1
> >     update-2
> > 
> > update:
> >   #!/bin/bash
> > 
> >   ./git/hooks/update.d/update-1 &&
> >   ./git/hooks/update.d/update-2
> > 
> > The goal is to make sure update-1 and update-2 are run when other update
> > hooks happen.
> 
> 
> 
> > With my proposal as is, I see two options:
> > 
> > 1)
> > .git/config:
> >   hook.runHookDir = true
> > 
> > This will run whatever else is in hook.update, and then it will run
> > .git/hooks/update. This is not ideal because hookDir could change, and
> > then the content of update will be incorrect:
> > 
> >   git config --add core.hookdir=~/hooks/
> >   mv .git/hooks/update* ~/hooks/
> >   # call something which invokes update hooks
> >   # ~/hooks/update fails, .git/hooks/update-1 is gone :(
> > 
> 
> This is actually fine.  We assume the user knows where they want to
> store their hooks.  If they change that, then that's intentional.
> 
> > .git/config:
> >   hook.update = 001:/project/path/.git/hooks/update.d/update-1
> >   hook.update = 002:/project/path/.git/hooks/update.d/update-2
> > 
> > This will run each update hook individually and have no notion of
> > whether they're in a "hook dir" or not. It sees a path, it hands it to
> > 'sh' to run, it checks the return code.
> 
> Correct.  This is the logical porting of the above shell script to the
> config syntax if you use an absolute path.
> 
> > Now I run 'mv .git/hooks/update* ~/hooks/'. Next time I invoke something
> > which would run the 'update' hook, it fails, because the path in the
> > config isn't pointing to anything. But this way is unrelated to hookdir.
> 
> Yes, that's correct.
> 
> It sounds like you're thinking of the config approach as completely
> orthogonal to the hook directory.  However, I want to have multiple
> per-repository hooks that do not live within the repository but move
> with it.  The logical place to store those hooks is in the hook
> directory, even if it's not being invoked by Git explicitly.  To use
> that with the config approach so I can have multiple hooks in a useful
> way that's compatible with other tools, I need some way to refer to
> whatever the hook directory is at a given point in time.

This sounds like a use case we can expect other users to want to work.
It is very different from the way I think about and use hooks, so I'm
glad you explained it to me.

(For the record, the way I think of hooks is something like this: I want
all repositories to run git-secrets before I push; I want a handful of
repositories to run the Gerrit Change-Id hook after I commit; I want one
repo to run linter A, and three or four other repos to run linter B. So
I tend to think "how can I apply hooks I have downloaded a la carte to
my repositories?" You seem to be thinking "how can I write hooks for one
repository?" They are both totally valid, of course.)

> 
> > It sounds like you might be asking for something like:
> > 
> > .git/config:
> >   hook.update = 001:__HOOKDIR__/update.d/update-1
> > 
> > I'm not sure that I like the idea of this. My own dream is to eliminate
> > the need for a hookdir entirely, so it's logically easy for my Gerrit
> > hook to live in ~/gerrit-tools/ and my linter to live in ~/clang-tidy/
> > and so on.
> 
> Using the config syntax eliminates per-repository storage for hooks.  I,
> personally, want to store multiple one-off hooks in my hooks directory.
> I want to use tools that install one-off hooks into my repository
> without needing to depend on the location of those tools in the future.
> I don't want those hooks to live elsewhere on my file system, since that
> makes my repository no longer self contained.
> 
> I want to store those hooks in the hook directory, wherever I've
> configured that, and whatever that happens to be at this point in time.
> 
> I may additionally have tools that live in other locations as well and
> may use other syntaxes to invoke them.  For example, I may install a
> hook that's provided by a Debian package and refer to it using a bare
> program name.
> 
> If your goal is to eliminate the hook directory entirely, then I can't
> say that I'm in support of that.  A design which does that won't meet my
> needs, and it likely won't meet the needs of other users.

Internally, that is our goal, absolutely. However, you yourself are
proving it's not a reasonable goal for the entire world. That was
somewhat expected, which is why I hoped "hook.runHookDir" and
"hook.warnHookDir" would provide a path for users to decide by
themselves that they don't want to use the hook directory anymore. A
self-paced hook directoy phase out seems like a good fit to make us both
happy, where your pace can be "over my dead body" if you so choose ;)

> 
> The benefit of your proposed config syntax for me is that it provides a
> standard way to configure multiple hooks.  I still want many of those
> hooks to live in the hook directory, although others may live elsewhere.
> 
> > I could see option 1 being alleviated by writing it as something like:
> > 
> > update:
> >   $GIT_HOOKDIR/update.d/update-1 &&
> >   $GIT_HOOKDIR/update.d/update-2
> > 
> > or
> > update:
> >   $(git config core.hookdir)/update.d/update-1 &&
> >   $(git config core.hookdir)/update.d/update-2
> 
> This is similar to what I want, and why I want to pass it to the shell.
> I can write the following, and everything just works:
> 
> .git/config:
>   hook.update = 001:$(git config core.hookdir || echo .git/hooks)/update.d/update-1
>   hook.update = 002:$(git config core.hookdir || echo .git/hooks)/update.d/update-2
> 
> Wherever I have placed my hooks for this repository, I can refer to
> them with a shell script.  I can also add the following line as well:
> 
> .git/config:
>   hook.update = debian-package-hook update
> 
> …and everything just works.
> 
> > But, I think once you "buy in" to the idea of providing a full path to
> > the final target - not to a trampoline script - in your config, you
> > should forget about the idea of "core.hookdir" having anything to do
> > with it.
> 
> I have no intention of providing a full path to any hook, since that's
> quite brittle.  There are very few paths on my system which can be
> guaranteed not to change, and all of them are things like /bin/sh or
> /usr/bin/env.  If my hooks are in the hook directory (or even in the
> working tree) with a full path and I move that repository, they become
> broken.  If they're shipped by Debian and Debian decides to move the
> command, they're broken.

I think this is again stemming from the different ways you and I are
thinking about our hooks, and that's fine. I'm open to the idea of
relative-path hooks, and a blurb in the interactive "git hook edit" mode
(or whatever it ends up being) explaining where relative paths will be
relative to. Is that OK for you?

> I'm very interested in learning more about why you seem to want to
> specify full paths.  It seems very at odds with the way the rest of Git
> works.

I hope my description closer to the top of this mail helps to explain.

> If the goal is to support other syntaxes in the future, then
> let's use a prefix character (e.g. !) to denote commands vs.
> non-commands or something like that.  If the goal is security, then I'd
> like to hear more about the security model you're trying to achieve with
> this design.

As for the security model, I think Jonathan explained it quite well in
the link I posted at the top of this thread. The tl;dr is that Malicious
User can hand Victim Sysadmin a .zip with a repo that is "broken",
and when Victim Sysadmin starts to explore it, she inadvertently
triggers hooks which were packaged in that .zip and compromises her
workstation. Eliminating the use of .git/hooks/ alone doesn't protect
from this - .git/config is also a vector, as one can include an alias
there - but I believe it's heading towards the right direction.

> 
> > To quote you out-of-order now:
> > 
> > > If we want to allow people to have multiple hooks under something like
> > > .git/hooks/pre-push.d, then we need to have a way to wire them up in the
> > > configuration using the correct location of the hook directory instead
> > > of using a helper script like the one I linked above.
> > 
> > I think I may spot the misunderstanding. It sounds like you hope someone
> > can provide "hook.update=001:~/update.d/" and have all the contents of
> > update.d executed. I'll be clear and say that I didn't have the
> > intention to support that at all; instead I was hoping to support a case
> > like 2. above. Recursing through directories like that sounds scary to
> > order.
> 
> I don't need a particular way to do that, since I can do it already, but
> I do need a way to wire up multiple hooks that are per-repository and
> move with the repository but aren't in the repository.  In other words,
> I need a functional replacement for that approach if we're not going to
> use that approach itself.

It sounds like simply supporting relative paths in the config, with
adequate explanation to the user about where the paths are relative to,
will solve this for you. What do you think?

> 
> Hopefully I've done a better job at explaining myself here.

Yeah, I think I made a breakthrough understanding you from this mail.
Thanks :) :)

 - Emily


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-02 23:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-16  1:11 [RFC] Hook management via 'git hooks' command Emily Shaffer
2019-11-16  5:45 ` brian m. carlson
2019-11-18 22:38   ` Emily Shaffer
2019-11-19  0:51     ` brian m. carlson
2019-11-23  1:19       ` Emily Shaffer
2019-11-25  3:04         ` brian m. carlson
2019-11-25 22:21           ` Emily Shaffer
2019-11-25 22:45             ` Emily Shaffer
2019-11-26  0:28             ` brian m. carlson
2019-11-26  0:56               ` Emily Shaffer
2019-11-26  2:41                 ` brian m. carlson
2019-12-02 23:46                   ` Emily Shaffer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).