All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] add post-fetch hook
@ 2011-12-24 23:42 Joey Hess
  2011-12-25  3:13 ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Joey Hess @ 2011-12-24 23:42 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4164 bytes --]

The post-fetch hook is fed on its stdin all refs that were newly fetched.
It is not allowed to abort the fetch (or pull), but can modify what
was fetched or take other actions.

One example use of this hook is to automatically merge certain remote
branches into a local branch. Another is to update a local cache
(such as a search index) with the fetched refs. No other hook is run
near fetch time, except for post-merge, which doesn't always run after a
fetch, which is why this additional hook is useful.

Signed-off-by: Joey Hess <joey@kitenet.net>

---

The #1 point of confusion for git-annex users is the need to run
"git annex merge" after fetching. That does a union merge of newly
fetched remote git-annex branches into the local git-annex branch.
If a user does a "git pull; ... ; git push" and forgets to git annex merge
in between, their push often fails as the git-annex branches have diverged.
With this hook, that confusing step can be eliminated.

Since git annex merge could be run at any point between fetch and push,
I considered several different hooks, including this one, a pre-push hook,
and a variant of this hook that does not feed the hook any information
on stdin. I chose this one, with the information on stdin because it seems
the most generally useful, and will let me make git annex merge slightly
more optimal than it would be without the stdin.

 Documentation/githooks.txt |   12 ++++++++++
 builtin/fetch.c            |   50 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 28edefa..96a588c 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -162,6 +162,18 @@ This hook can be used to perform repository validity checks, auto-display
 differences from the previous HEAD if different, or set working dir metadata
 properties.
 
+post-fetch
+~~~~~~~~~~
+
+This hook is invoked by 'git fetch' (commonly called by 'git pull'), after
+refs have been fetched from the remote repository. It takes no arguments,
+but is fed a list of new or updated refs on its standard input. This hook
+cannot affect the outcome of 'git fetch' and is not executed, if nothing
+was fetched.
+
+This hook can make modifications to the fetched refs, or take other
+actions.
+
 post-merge
 ~~~~~~~~~~
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 33ad3aa..d813b8e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -89,6 +89,52 @@ static struct option builtin_fetch_options[] = {
 	OPT_END()
 };
 
+static const char post_fetch_hook[] = "post-fetch";
+struct ref *fetched_refs = NULL;
+void run_post_fetch_hook (void) {
+	struct ref *ref;
+	struct child_process proc;
+	const char *argv[2];
+	FILE *f;
+
+	if (! fetched_refs)
+		return;
+
+	argv[0] = git_path("hooks/%s", post_fetch_hook);
+	if (access(argv[0], X_OK) < 0)
+		return;
+	argv[1] = NULL;
+
+	memset(&proc, 0, sizeof(proc));
+	proc.argv = argv;
+	proc.in = -1;
+	proc.stdout_to_stderr = 1;
+	if (start_command(&proc) != 0)
+		return;
+
+	f = fdopen(proc.in, "w");
+	if (f == NULL) {
+		close(proc.in);
+		goto cleanup;
+	}
+	for (ref = fetched_refs; ref; ref = ref->next)
+		fprintf(f, "%s\n", ref->name);
+	fclose(f);
+
+cleanup:
+	free_refs(fetched_refs);
+	fetched_refs = NULL;
+
+	finish_command(&proc);
+	close(proc.in);
+}
+
+void post_fetch_hook_observe (const struct ref *fetched_ref) {
+	struct ref *ref = copy_ref(fetched_ref);
+	ref->next = fetched_refs;
+	fetched_refs = ref;
+}
+
 static void unlock_pack(void)
 {
 	if (transport)
@@ -233,6 +279,7 @@ static int s_update_ref(const char *action,
 	if (write_ref_sha1(lock, ref->new_sha1, msg) < 0)
 		return errno == ENOTDIR ? STORE_REF_ERROR_DF_CONFLICT :
 					  STORE_REF_ERROR_OTHER;
+	post_fetch_hook_observe(ref);
 	return 0;
 }
 
@@ -755,6 +802,9 @@ static int do_fetch(struct transport *transport,
 		free_refs(ref_map);
 	}
 
+	/* Run hook only after fetching all refs. */
+	run_post_fetch_hook();
+
 	return 0;
 }
 
-- 
1.7.7.3

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-24 23:42 [PATCH] add post-fetch hook Joey Hess
@ 2011-12-25  3:13 ` Junio C Hamano
  2011-12-25  3:50   ` Joey Hess
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2011-12-25  3:13 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> The post-fetch hook is fed on its stdin all refs that were newly fetched.
> It is not allowed to abort the fetch (or pull), but can modify what
> was fetched or take other actions.
>
> One example use of this hook is to automatically merge certain remote
> branches into a local branch. Another is to update a local cache
> (such as a search index) with the fetched refs. No other hook is run
> near fetch time, except for post-merge, which doesn't always run after a
> fetch, which is why this additional hook is useful.
>
> Signed-off-by: Joey Hess <joey@kitenet.net>

A typical "'git pull' invokes 'git fetch' and then lets 'git merge' (or
'git rebase') integrate the work on the current branch with what was
fetched" sequence goes like this:

 - 'git fetch':
  . grabs the necessary objects from the remote;
  . decides what remote tracking branches are updated to point
    at what objects, and what updates are to be denied;
  . updates remote tracking branches accordingly; and
  . writes $GIT_DIR/FETCH_HEAD to communicate what have been fetched
    and what are to be merged.

 - 'git merge':
  . reads $GIT_DIR/FETCH_HEAD to learn what commits to be merged; and
  . merges the commits to the current branch.

Even though we do not add arbitrary hooks on the client side that could
easily be implemented by wrapping the client side commands (i.e. you could
implement "git myfetch" that runs "git fetch" followed by whatever script
that mucks with the result of the fetch) in general, I can see that it
would be useful to have a hook that can tweak the result of the fetch run
inside of the "git pull", because you cannot tell "git pull" to run "git
myfetch" instead of "git fetch".

Because the sequence of "git fetch" followed by "git merge", both commands
issued by the end user, should be equivalent to "git pull" from an end
user's point of view, the hook must be called from near the end of "git
fetch" if we were to have such a hook that tweaks the result of the fetch
inside "pull". IOW, the implementation, even though logically it belongs
to "pull", has to be inside "fetch", not "pull".

In that sense, I am not fundamentally opposed to the idea of adding a post
fetch hook that allows tweaking of the result.

*HOWEVER*

If we _were_ to sanction the use of the hook to tweak the result, I do not
want to see it implemented as an ad-hoc hack that tells the hook writers
that it is _entirely_ their responsiblity to update the remote tracking
branches from what it fetched, and also update $GIT_DIR/FETCH_HEAD to
maintain consistency between these two places.

A very cursory look at the patch tells me that there are a few problems
with it.  It does not seem to affect what will go to $GIT_DIR/FETCH_HEAD
at all, and hence it does not have any way to affect the result of the
fetch that does not store it to any of our remote tracking branches.

> The #1 point of confusion for git-annex users is the need to run
> "git annex merge" after fetching. That does a union merge of newly
> fetched remote git-annex branches into the local git-annex branch.

That use case sounds like that "git fetch" is called as a first class UI,
which is covered by "git myfetch" (you can call it "git annex fetch")
wrapper approach, the canonical example of a hook that we explicitly do
not want to add. It also does not seem to call for mucking with the result
of the fetch at all.

Perhaps the two concepts should be separated into different hooks?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-25  3:13 ` Junio C Hamano
@ 2011-12-25  3:50   ` Joey Hess
  2011-12-25  9:30     ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Joey Hess @ 2011-12-25  3:50 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]

Junio C Hamano wrote:
> If we _were_ to sanction the use of the hook to tweak the result, I do not
> want to see it implemented as an ad-hoc hack that tells the hook writers
> that it is _entirely_ their responsiblity to update the remote tracking
> branches from what it fetched, and also update $GIT_DIR/FETCH_HEAD to
> maintain consistency between these two places.
> 
> A very cursory look at the patch tells me that there are a few problems
> with it.  It does not seem to affect what will go to $GIT_DIR/FETCH_HEAD
> at all, and hence it does not have any way to affect the result of the
> fetch that does not store it to any of our remote tracking branches.

True, it does not update FETCH_HEAD. I had not considered using the hook
that way.

I suppose that after running the hook, fetch could check each remote
tracking branch for a new value, and only then write to FETCH_HEAD.

> > The #1 point of confusion for git-annex users is the need to run
> > "git annex merge" after fetching. That does a union merge of newly
> > fetched remote git-annex branches into the local git-annex branch.
> 
> That use case sounds like that "git fetch" is called as a first class UI,
> which is covered by "git myfetch" (you can call it "git annex fetch")
> wrapper approach, the canonical example of a hook that we explicitly do
> not want to add. It also does not seem to call for mucking with the result
> of the fetch at all.

Most users are fetching by calling git pull as part of their normal
workflow. I would like to avoid git-annex needing its own special pull
command. For one thing, there can be many programs that use git branches
in similar ways (another one is pristine-tar), and a user shouldn't have
to run multiple wrapped versions of git fetch or pull when using
multiple such programs.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-25  3:50   ` Joey Hess
@ 2011-12-25  9:30     ` Junio C Hamano
  2011-12-25 16:24       ` Jakub Narebski
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Junio C Hamano @ 2011-12-25  9:30 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> Junio C Hamano wrote:
>> If we _were_ to sanction the use of the hook to tweak the result, I do not
>> want to see it implemented as an ad-hoc hack that tells the hook writers
>> that it is _entirely_ their responsiblity to update the remote tracking
>> branches from what it fetched, and also update $GIT_DIR/FETCH_HEAD to
>> maintain consistency between these two places.
>> 
>> A very cursory look at the patch tells me that there are a few problems
>> with it.  It does not seem to affect what will go to $GIT_DIR/FETCH_HEAD
>> at all, and hence it does not have any way to affect the result of the
>> fetch that does not store it to any of our remote tracking branches.
>
> True, it does not update FETCH_HEAD. I had not considered using the hook
> that way.

Perhaps I misread what you wrote in the commit log message then.  I
somehow got an impression that one of the advertised way for the hook to
be used was to lie which commits the remote tracking refs point at by
letting it run "update-ref refs/*/*" and the lie is later picked up by
re-reading them, but I wasn't reading the patch very carefully.

If your use case does not involve updating the remote tracking refs nor
FETCH_HEAD (updating only one and not the other is a no-starter), then we
should explicitly forbid it in the documentation, as allowing updates will
invite inconsistencies.

Although I do not deeply care between such a "trigger to only notify, no
touching" hook and a full-blown "allow hook writers to easily lie about
what happened in the fetch" hook, I was hoping that we would get this
right and useful if we were spending our brain bandwidth on it. I am not
very fond of an easier "trigger to only notify" hook because people are
bound to misuse the interface and try updating the refs anyway, making it
easy to introduce inconsistencies between refs and FETCH_HEAD that will
confuse the later "merge" step.

As hook writers are more prone to write such an incorrect code than people
who implement the mechanism to call hooks on the git-core side, the more
the hook interface helps hook writers to avoid such mistakes, the better.

So if we were to allow the hook to lie what commits were fetched and store
something different from what we fetched in the remote tracking refs, I
think the correct place to do so would be in store_updated_refs(),
immediately before we call check_everything_connected().

 - Feed the contents of the ref_map to the hook. For each entry, the hook
   would get (at least):
   . the object name;
   . the refname at the remote;
   . the refname at the local (which could be empty when we are not
     copying it to any of our local ref); and
   . if the entry is to be used for merge.

 - The hook must read _everything_ from its standard input, and then
   return the
   re-written result in the same format as its input. The hook could
   . update the object name (i.e. re-point the remote tracking ref);
   . update the local refname (i.e. use different remote tracking ref);
   . change "merge" flag between true/false; and/or
   . add or remove entries

 - You read from the hook and replace the ref_map list that is fed to
   check_everything_connected(). This ref_map list is what is used in the
   next for() loop that calls update_local_ref() to update the remote
   tracking ref, records the entry in FETCH_HEAD, and produces the report.

This way, the hook cannot screw up, as what it tells us will consistently
be written by us to where it should go.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-25  9:30     ` Junio C Hamano
@ 2011-12-25 16:24       ` Jakub Narebski
  2011-12-25 18:06         ` Joey Hess
  2011-12-26  8:09         ` Junio C Hamano
  2011-12-25 17:54       ` Joey Hess
  2011-12-26  2:31       ` Joey Hess
  2 siblings, 2 replies; 16+ messages in thread
From: Jakub Narebski @ 2011-12-25 16:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Joey Hess, git

Junio C Hamano <gitster@pobox.com> writes:

> [...] if we were to allow the hook to lie what commits were fetched and store
> something different from what we fetched in the remote tracking refs, I
> think the correct place to do so would be in store_updated_refs(),
> immediately before we call check_everything_connected().
> 
>  - Feed the contents of the ref_map to the hook. For each entry, the hook
>    would get (at least):
>    . the object name;
>    . the refname at the remote;
>    . the refname at the local (which could be empty when we are not
>      copying it to any of our local ref); and
>    . if the entry is to be used for merge.
> 
>  - The hook must read _everything_ from its standard input, and then
>    return the
>    re-written result in the same format as its input. The hook could
>    . update the object name (i.e. re-point the remote tracking ref);
>    . update the local refname (i.e. use different remote tracking ref);
>    . change "merge" flag between true/false; and/or
>    . add or remove entries

This is a very nice idea, IMHO, both because it makes it simple to
implement no-op (example) hook by just using "cat", and beause it
makes possible to stack such hooks (e.g. one from git-annex with the
one from pristine-tar etc.).

One thing that needs to be specified is what should happen if the hook
changes "the refname at the remote" part...
 
>  - You read from the hook and replace the ref_map list that is fed to
>    check_everything_connected(). This ref_map list is what is used in the
>    next for() loop that calls update_local_ref() to update the remote
>    tracking ref, records the entry in FETCH_HEAD, and produces the report.
> 
> This way, the hook cannot screw up, as what it tells us will consistently
> be written by us to where it should go.

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-25  9:30     ` Junio C Hamano
  2011-12-25 16:24       ` Jakub Narebski
@ 2011-12-25 17:54       ` Joey Hess
  2011-12-26  2:31       ` Joey Hess
  2 siblings, 0 replies; 16+ messages in thread
From: Joey Hess @ 2011-12-25 17:54 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2987 bytes --]

Junio C Hamano wrote:
> So if we were to allow the hook to lie what commits were fetched and store
> something different from what we fetched in the remote tracking refs, I
> think the correct place to do so would be in store_updated_refs(),
> immediately before we call check_everything_connected().
> 
>  - Feed the contents of the ref_map to the hook. For each entry, the hook
>    would get (at least):
>    . the object name;
>    . the refname at the remote;
>    . the refname at the local (which could be empty when we are not
>      copying it to any of our local ref); and
>    . if the entry is to be used for merge.
> 
>  - The hook must read _everything_ from its standard input, and then
>    return the
>    re-written result in the same format as its input. The hook could
>    . update the object name (i.e. re-point the remote tracking ref);
>    . update the local refname (i.e. use different remote tracking ref);
>    . change "merge" flag between true/false; and/or
>    . add or remove entries
> 
>  - You read from the hook and replace the ref_map list that is fed to
>    check_everything_connected(). This ref_map list is what is used in the
>    next for() loop that calls update_local_ref() to update the remote
>    tracking ref, records the entry in FETCH_HEAD, and produces the report.
> 
> This way, the hook cannot screw up, as what it tells us will consistently
> be written by us to where it should go.

This is a good plan, the only problem I see with it is that
store_updated_refs is potentially called twice in a fetch, when the
automated tag following is done. I don't see that as a large problem,
perhaps it could even be optimised away.

The format of .git/FETCH_HEAD does not seem appropriate for this hook
to use (it's not documented, and it doesn't quite have all the necessary
fields, particularly missing the local refname). Instead, how about this,
for the hook's input/output format?

<sha1> SP <not-for-merge|merge> SP <remote-refname> SP <local-refname> LF

Example:

5d6dfc7cb140a6eb90138334fab2245b69bc8bc4 merge refs/heads/master refs/remotes/origin/master
f95247ea15bc62a2dab0f6ae3cd247267a0639b8 not-for-merge refs/heads/pu refs/remotes/origin/pu
2ce0edcd786b790fed580e7df56291619834d276 not-for-merge refs/tags/v1.7.8.1 refs/tags/v1.7.8.1

Allowing the hook to change the merge flag does open up some other
interesting uses of the hook. I can now think of three use cases for it:

1. Only accepting tags that meet some criteria, such as being signed
   by a trusted signature.
2. Causing branches that would not normally be merged to get merged.
   For example, a hook could set the merge flag on a branch when it was
   pulled from a remote other than branch.master.remote. This could be useful
   when using git without a single central origin and with a number of
   repositories that are always wanted to be kept merged.
3. My git annex merge case.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-25 16:24       ` Jakub Narebski
@ 2011-12-25 18:06         ` Joey Hess
  2011-12-26  8:09         ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Joey Hess @ 2011-12-25 18:06 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 952 bytes --]

Jakub Narebski wrote:
> This is a very nice idea, IMHO, both because it makes it simple to
> implement no-op (example) hook by just using "cat", and beause it
> makes possible to stack such hooks (e.g. one from git-annex with the
> one from pristine-tar etc.).

Indeed.

> One thing that needs to be specified is what should happen if the hook
> changes "the refname at the remote" part...

I'm not sure what the use case is for including the remote's refname is yet.
Changing it would allow changing what's written into FETCH_HEAD though.
For example:

-2ce0edcd786b790fed580e7df56291619834d276        not-for-merge   branch 'maint' of git://git.kernel.org/pub/scm/git/git
+2ce0edcd786b790fed580e7df56291619834d276        not-for-merge   branch 'jc-maint' of git://git.kernel.org/pub/scm/git/git

And that would in turn be used by a few things that consume that
information. Whether that's useful, I don't know.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] add post-fetch hook
  2011-12-25  9:30     ` Junio C Hamano
  2011-12-25 16:24       ` Jakub Narebski
  2011-12-25 17:54       ` Joey Hess
@ 2011-12-26  2:31       ` Joey Hess
  2011-12-26  7:59         ` Junio C Hamano
  2011-12-27 21:27         ` Johannes Sixt
  2 siblings, 2 replies; 16+ messages in thread
From: Joey Hess @ 2011-12-26  2:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

The post-fetch hook is fed lines on stdin for all refs that were fetched, and
outputs on stdout possibly modified lines. Its output is parsed and used
when git fetch updates the remote tracking refs, records the entries in
FETCH_HEAD, and produces its report.

---

Not quite ready to sign off on this yet, but it does work.
Comments and code review appreciated.

Demo:

joey@gnu:~/tmp/demo>cat .git/hooks/post-fetch
#!/bin/sh
# Rename branches, and block all tags.
sed 's!foo!bar!g' | grep -v refs/tags/
joey@gnu:~/tmp/demo>chmod +x .git/hooks/post-fetch
joey@gnu:~/tmp/demo>git pull
From /home/joey/tmp/a
 * [new branch]      bar        -> origin/bar
Already up-to-date.
joey@gnu:~/tmp/demo>chmod -x .git/hooks/post-fetch
joey@gnu:~/tmp/demo>git pull
From /home/joey/tmp/a
 * [new branch]      foo        -> origin/foo
 * [new tag]         v1.0       -> v1.0
Already up-to-date.
joey@gnu:~/tmp/demo>chmod +x .git/hooks/post-fetch
joey@gnu:~/tmp/demo>git remote update --prune
Fetching origin
 x [deleted]         (none)     -> origin/foo


 Documentation/githooks.txt |   29 ++++++
 builtin/fetch.c            |  228 ++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 238 insertions(+), 19 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 28edefa..9c2b6bf 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -162,6 +162,35 @@ This hook can be used to perform repository validity checks, auto-display
 differences from the previous HEAD if different, or set working dir metadata
 properties.
 
+post-fetch
+~~~~~~~~~~
+
+This hook is invoked by 'git fetch' (commonly called by 'git pull'), after
+refs have been fetched from the remote repository. It is not executed, if
+nothing was fetched.
+
+It takes no arguments, but is fed a line of the following format on
+its standard input for each ref that was fetched.
+
+  <sha1> SP not-for-merge|merge SP <remote-refname> SP <local-refname> LF
+
+Where the "not-for-merge" flag indicates the ref is not to be merged into the
+current branch, and the "merge" flag indicates that 'git merge' should
+later merge it. The `<remote-refname>` is the remote's name for the ref
+that was pulled, and `<local-refname>` is a name of a remote-tracking branch,
+like "refs/remotes/origin/master", or can be empty if the fetched ref is not
+being stored in a local refname.
+
+The hook must consume all of its standard input, and output back lines
+of the same format. It can modify its input as desired, including
+adding or removing lines, updating the sha1 (i.e. re-point the
+remote-tracking branch), changing the merge flag, and changing the
+`<local-refname>` (i.e. use different remote-tracking branch).
+
+The output of the hook is used to update the remote-tracking branches, and
+`.git/FETCH_HEAD`, in preparation for for a later merge operation done by
+'git merge'.
+
 post-merge
 ~~~~~~~~~~
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 33ad3aa..aa401b2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -89,6 +89,188 @@ static struct option builtin_fetch_options[] = {
 	OPT_END()
 };
 
+static int add_existing(const char *refname, const unsigned char *sha1,
+			int flag, void *cbdata)
+{
+	struct string_list *list = (struct string_list *)cbdata;
+	struct string_list_item *item = string_list_insert(list, refname);
+	item->util = (void *)sha1;
+	return 0;
+}
+	
+static const char post_fetch_hook[] = "post-fetch";
+struct ref *post_fetch_hook_refs;
+
+int feed_post_fetch_hook (int in, int out, void *data)
+{
+	struct ref *ref;
+	struct strbuf buf = STRBUF_INIT;
+	int ret;
+
+	for (ref = post_fetch_hook_refs; ref; ref = ref->next) {
+		strbuf_addstr(&buf, sha1_to_hex(ref->old_sha1));
+		strbuf_addch(&buf, ' ');
+		strbuf_addstr(&buf, ref->merge ? "merge" : "not-for-merge");
+		strbuf_addch(&buf, ' ');
+		if (ref->name)
+			strbuf_addstr(&buf, ref->name);
+		strbuf_addch(&buf, ' ');
+		if (ref->peer_ref && ref->peer_ref->name)
+			strbuf_addstr(&buf, ref->peer_ref->name);
+		strbuf_addch(&buf, '\n');
+	}
+
+	ret = write_in_full(out, buf.buf, buf.len) != buf.len;
+	if (ret)
+		warning("%s hook failed to consume all its input",
+				post_fetch_hook);
+	close(out);
+	strbuf_release(&buf);
+	return ret;
+}
+	
+struct ref *parse_post_fetch_hook_line (char *l, 
+		struct string_list *existing_refs)
+{
+	struct ref *ref = NULL, *peer_ref = NULL;
+	struct string_list_item *peer_item = NULL;
+	char *words[4];
+	int i, word=0;
+	char *problem;
+
+	for (i=0; l[i]; i++) {
+		if (isspace(l[i])) {
+			l[i]='\0';
+			words[word]=l;
+			l+=i+1;
+			i=0;
+			word++;
+			if (word > 3) {
+				problem="too many words";
+				goto unparsable;
+			}
+		}
+	}
+	if (word < 3) {
+		problem="not enough words";
+		goto unparsable;
+	}
+	
+	ref = alloc_ref(words[2]);
+	peer_ref = ref->peer_ref = alloc_ref(l);
+	ref->peer_ref->force=1;
+
+	if (get_sha1_hex(words[0], ref->old_sha1)) {
+		problem="bad sha1";
+		goto unparsable;
+	}
+
+	if (strcmp(words[1], "merge") == 0) {
+		ref->merge=1;
+	}
+	else if (strcmp(words[1], "not-for-merge") != 0) {
+		problem="bad merge flag";
+		goto unparsable;
+	}
+
+	peer_item = string_list_lookup(existing_refs, peer_ref->name);
+	if (peer_item)
+		hashcpy(peer_ref->old_sha1, peer_item->util);
+
+	return ref;
+
+ unparsable:
+	warning("%s hook output a wrongly formed line: %s",
+			post_fetch_hook, problem);
+	free(ref);
+	free(peer_ref);
+	return NULL;
+}
+
+/* The hook is fed lines of the form:
+ * <sha1> SP <not-for-merge|merge> SP <remote-refname> SP <local-refname> LF
+ * And should output rewritten lines of the same form.
+ */
+struct ref *run_post_fetch_hook (struct ref *fetched_refs)
+{
+	struct ref *new_refs = NULL;
+	struct string_list existing_refs = STRING_LIST_INIT_NODUP;
+	struct child_process hook;
+	struct async async;
+	const char *argv[2];
+	FILE *f;
+	struct strbuf buf;
+	struct ref *ref, *prevref=NULL;
+	int ok = 1;
+
+	if (! fetched_refs)
+		return fetched_refs;
+
+	argv[0] = git_path("hooks/%s", post_fetch_hook);
+	if (access(argv[0], X_OK) < 0)
+		return fetched_refs;
+	argv[1] = NULL;
+
+	memset(&hook, 0, sizeof(hook));
+	hook.argv = argv;
+	hook.in = -1;
+	hook.out = -1;
+	if (start_command(&hook) != 0)
+		return fetched_refs;
+
+	/* Use an async writer to feed the hook process.
+	 * This allows the hook to read and write a line at
+	 * a time without blocking. */
+	memset(&async, 0, sizeof(async));
+	async.proc = feed_post_fetch_hook;
+	post_fetch_hook_refs = fetched_refs;
+	async.out = hook.in;
+	if (start_async(&async))
+		goto failed_hook;
+
+	for_each_ref(add_existing, &existing_refs);
+
+	f = fdopen(hook.out, "r");
+	if (f == NULL)
+		goto failed_hook;
+	strbuf_init(&buf, 128);
+	while (strbuf_getline(&buf, f, '\n') != EOF) {
+		char *l = strbuf_detach(&buf, NULL);
+		ref = parse_post_fetch_hook_line(l, &existing_refs);
+		if (ref) {
+			if (prevref) {
+				prevref->next=ref;
+				prevref=ref;
+			}
+			else {
+				new_refs = prevref = ref;
+			}
+		}
+		else {
+			ok = 0; /* ignore the other output */
+		}
+		free(l);
+	}
+	strbuf_release(&buf);
+	fclose(f);
+
+	if (finish_async(&async))
+		goto failed_hook;
+	finish_command(&hook);
+
+	if (! ok)
+		return fetched_refs;
+	/* The new_refs are returned, to be used in place of fetched_refs,
+	 * so it is not needed anymore and can be freed here. */
+	free_refs(fetched_refs);
+	return new_refs;
+
+failed_hook:
+	close(hook.out);
+	finish_command(&hook);
+	return fetched_refs;
+}
+
 static void unlock_pack(void)
 {
 	if (transport)
@@ -507,17 +689,30 @@ static int quickfetch(struct ref *ref_map)
 	return check_everything_connected(iterate_ref_map, 1, &rm);
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+struct fetch_refs_result {
+	struct ref *new_refs;
+	int status;
+};
+
+static struct fetch_refs_result fetch_refs(struct transport *transport,
+		struct ref *ref_map)
 {
-	int ret = quickfetch(ref_map);
-	if (ret)
-		ret = transport_fetch_refs(transport, ref_map);
-	if (!ret)
-		ret |= store_updated_refs(transport->url,
+	struct fetch_refs_result res;
+	res.status = quickfetch(ref_map);
+	if (res.status)
+		res.status = transport_fetch_refs(transport, ref_map);
+	if (!res.status) {
+		res.new_refs = run_post_fetch_hook(ref_map);
+
+		res.status |= store_updated_refs(transport->url,
 				transport->remote->name,
-				ref_map);
+				res.new_refs);
+	}
+	else {
+		res.new_refs = ref_map;
+	}
 	transport_unlock_pack(transport);
-	return ret;
+	return res;
 }
 
 static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map)
@@ -542,15 +737,6 @@ static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map)
 	return result;
 }
 
-static int add_existing(const char *refname, const unsigned char *sha1,
-			int flag, void *cbdata)
-{
-	struct string_list *list = (struct string_list *)cbdata;
-	struct string_list_item *item = string_list_insert(list, refname);
-	item->util = (void *)sha1;
-	return 0;
-}
-
 static int will_fetch(struct ref **head, const unsigned char *sha1)
 {
 	struct ref *rm = *head;
@@ -673,6 +859,7 @@ static int do_fetch(struct transport *transport,
 	struct string_list_item *peer_item = NULL;
 	struct ref *ref_map;
 	struct ref *rm;
+	struct fetch_refs_result res;
 	int autotags = (transport->remote->fetch_tags == 1);
 
 	for_each_ref(add_existing, &existing_refs);
@@ -710,7 +897,9 @@ static int do_fetch(struct transport *transport,
 
 	if (tags == TAGS_DEFAULT && autotags)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
-	if (fetch_refs(transport, ref_map)) {
+	res = fetch_refs(transport, ref_map);
+	ref_map = res.new_refs;
+	if (res.status) {
 		free_refs(ref_map);
 		return 1;
 	}
@@ -750,7 +939,8 @@ static int do_fetch(struct transport *transport,
 		if (ref_map) {
 			transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 			transport_set_option(transport, TRANS_OPT_DEPTH, "0");
-			fetch_refs(transport, ref_map);
+			res = fetch_refs(transport, ref_map);
+			ref_map = res.new_refs;
 		}
 		free_refs(ref_map);
 	}

-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-26  2:31       ` Joey Hess
@ 2011-12-26  7:59         ` Junio C Hamano
  2011-12-26 15:51           ` Joey Hess
  2011-12-27 21:27         ` Johannes Sixt
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2011-12-26  7:59 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> The post-fetch hook is fed lines on stdin for all refs that were fetched, and
> outputs on stdout possibly modified lines. Its output is parsed and used
> when git fetch updates the remote tracking refs, records the entries in
> FETCH_HEAD, and produces its report.
>
> ---
>
> Not quite ready to sign off on this yet, but it does work.
> Comments and code review appreciated.

Thanks. I do have a few comments.

This hook is no longer a "post" fetch hook. The mechanism lets the object
transfer phase does its work and then rewrites/tweaks the result before
fetch completes. To an outside observer, what the hook does is an integral
part of what "fetch" does, and not something that happens _after_ fetch
completes. I am bad at naming things, but something along the lines of
"tweak-fetch" that makes it clear that what happens in the hook is still
part of the fetching may be a more appropriate name, methinks.

I very much on purpose said that the hook "must read everything from its
standard input, *and* *then* return..." in my response. Your "Demo" hook
emits output as it reads its input with sed, but your main process invokes
the hook, drains everything with write_in_full() before starting to read a
single line, so I suspect that your hook will deadlock when its output
pipe buffer fills up without being read by the main process. Of course,
for this deadlock to actually happen, you need to be fetching quite a lot
of refs.

To make the life of hook writers easier, however, it would be good to
support a hook written in the style of your "Demo" hook, instead of
requiring it to read everything before emiting any output. I think you
could solve this by having a select(2) loop to avoid the deadlock on our
end (lets call the code that spawns a hook with run_command() and
interacts with it a "hook driver" in the rest of this message), but before
going in that direction, I would like to see us stepping back and bit and
think about the way hooks are called in the current code.

We seem to already have too many hook drivers, each of which hand-roll
similar logic using run-command API. At some point, we would want to have
a single "run_hook" helper function that takes:

 - the name of a hook;
 - the command line arguments to be fed;
 - a callback "generator" function to feed the standard input stream of
   the hook process;
 - a callback "consumer" function to receive the standard output stream of
   the hook process; and
 - set of environment tweaks while running the hook (e.g. run the hook
   while setting GIT_INDEX_FILE to a temporary file).

and hides away the complexity from hook drivers.  The command line
arguments, input and output callback functions are all optional depending
on the API between the hook driver and the hook (e.g. the "post-update"
hook takes arguments from its command line but does not interact with the
standard I/O stream, while the "post-receive" hook takes its input from
the standard input stream). Tweaking of the environment is also optional;
not many hooks need it.

By formalizing the hook driver API that way, any hook driver that drives a
tricky hook that may need a select(2) loop to avoid a deadlock in a way
similar to your patch do would not have to worry about the issue, as the
run_hook() helper would take care of it by reading from the hook's output
pipe and drain the pipe by calling the "consumer" callback before calling
the "generator" callback and feed more input to the hook to cause a
deadlock. We also could in the future do many other things if and when we
wanted to that the current code structure makes difficult. A few examples
that readily come to my mind are:

 - relocate where the hook scripts live, by limiting the hook driver API
   to take just the "name" of the hook. The current hook callsites know
   that the hooks live in git_path("hooks/$name") and call run_command()
   on their own, but the run_hook() helper could redirect it away to
   somewhere else, e.g. "/etc/git-core/hooks/$name".

 - run a set of hooks on the same triggering condition. You may want to
   have two "post-receive" hooks, one to feed an e-mail based notification
   system and another to drive an autobuilder, for example. For this to
   work, the "generator" and "consumer" callbacks need to have a way for
   us to tell "beginning of a new session" and "end of a new session", so
   that they can produce/consume the same set of values more than once.

 - perhaps ignore SIGPIPE if the hook chooses not to read any information
   the hook driver provides with it.

I have been wondering when would be the good time to refactor the hook
driver API.  We can add your patch, after polishing it enough to make it
ready for inclusion, independent of the hook API refactoring. But that
would mean that it would require more work when refactoring the hook API,
as we would have one more hand-rolled hook caller that is based on
run_command().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-25 16:24       ` Jakub Narebski
  2011-12-25 18:06         ` Joey Hess
@ 2011-12-26  8:09         ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2011-12-26  8:09 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Joey Hess, git

Jakub Narebski <jnareb@gmail.com> writes:

> One thing that needs to be specified is what should happen if the hook
> changes "the refname at the remote" part...

Hmm, good eyes.

The mechanism is to allow the hook to rewrite what happened during the
fetch. If we decided refs/heads/master on the remote that points at the
commit $X should update refs/remotes/origin/master, we tell that to the
hook, and the hook reads it. The hook may tell us to pretend that we
fetched refs/heads/next on the remote that points at the commit $Y should
update refs/remotes/origin/pu. In FETCH_HEAD we leave where the commit was
fetched from and hook will affect that information (which is used in the
resulting merge commit log message), but otherwise I do not think anything
unexpected would happen, as tracking refs do not record where the stuff
came from (perhaps they are in reflog? I didn't check).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-26  7:59         ` Junio C Hamano
@ 2011-12-26 15:51           ` Joey Hess
  2011-12-27  6:37             ` Junio C Hamano
  2011-12-27 23:23             ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Joey Hess @ 2011-12-26 15:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


[-- Attachment #1.1: Type: text/plain, Size: 4024 bytes --]

Junio C Hamano wrote:
> Thanks. I do have a few comments.
> 
> This hook is no longer a "post" fetch hook. The mechanism lets the object
> transfer phase does its work and then rewrites/tweaks the result before
> fetch completes. To an outside observer, what the hook does is an integral
> part of what "fetch" does, and not something that happens _after_ fetch
> completes. I am bad at naming things, but something along the lines of
> "tweak-fetch" that makes it clear that what happens in the hook is still
> part of the fetching may be a more appropriate name, methinks.

I'd be happy to call it tweak-fetch.

> I very much on purpose said that the hook "must read everything from its
> standard input, *and* *then* return..." in my response. Your "Demo" hook
> emits output as it reads its input with sed, but your main process invokes
> the hook, drains everything with write_in_full() before starting to read a
> single line, so I suspect that your hook will deadlock when its output
> pipe buffer fills up without being read by the main process. Of course,
> for this deadlock to actually happen, you need to be fetching quite a lot
> of refs.

Doesn't having the hook be fed by the async process/thread avoid this
problem? That's exactly why I did that, being familiar with the problem
from stupid perl scripts like this one:

#!/usr/bin/perl
use FileHandle;
use IPC::Open2;
$pid = open2(*Reader, *Writer, "cat");
print Writer "stuff $_\n" for 1..100000; # blocks forever
close Writer;
print "got: $_" while <Reader>;

I've tested feeding large quantities of data through my demo hook
(just feed it all the lines repeated 100 thousand times) and have
not seen it block. And other code in git uses an async feeder similarly,
see for example convert.c's apply_filter(). So I think this is ok..?

> We seem to already have too many hook drivers, each of which hand-roll
> similar logic using run-command API. At some point, we would want to have
> a single "run_hook" helper function that takes:

This was also my feeling as I looked around the code.

> By formalizing the hook driver API that way, any hook driver that drives a
> tricky hook that may need a select(2) loop to avoid a deadlock in a way
> similar to your patch do would not have to worry about the issue, as the
> run_hook() helper would take care of it by reading from the hook's output
> pipe and drain the pipe by calling the "consumer" callback before calling
> the "generator" callback and feed more input to the hook to cause a
> deadlock.

Actually, run_hook would need to either have a select loop, or an async
feeder like I've used. The method you describe is itself prone to deadlock!
Consider a hook that *does* consume all its input before outputting anything.
The first call to the "consumer" callback would block.

>  - run a set of hooks on the same triggering condition. You may want to
>    have two "post-receive" hooks, one to feed an e-mail based notification
>    system and another to drive an autobuilder

Something I've always wanted, in fact..

> I have been wondering when would be the good time to refactor the hook
> driver API.  We can add your patch, after polishing it enough to make it
> ready for inclusion, independent of the hook API refactoring. But that
> would mean that it would require more work when refactoring the hook API,
> as we would have one more hand-rolled hook caller that is based on
> run_command().

On the other hand, I have uses for this hook that are blocked on it
getting into git, so would rather not see it hung up in a general
refactoring. If it helps, I'd be happy to help with a hook refactoring
later.

The latest version of my patch (attached) already lifts reading from the
hook out into a helper function, so should need not many changes in such
a refactoring -- most of my run_tweak_fetch_hook would simply go away
then. I've also cleaned it up a lot and and pretty happy with it now.

-- 
see shy jo

[-- Attachment #1.2: 0001-preparations-for-tweak-fetch-hook.patch --]
[-- Type: text/x-diff, Size: 4094 bytes --]

From a06b5ef9908f56692a07fb37f5794c4122f10491 Mon Sep 17 00:00:00 2001
From: Joey Hess <joey@kitenet.net>
Date: Mon, 26 Dec 2011 10:44:55 -0400
Subject: [PATCH 1/2] preparations for tweak-fetch hook

No behavior changes yet, only some groundwork for the next
change.

The refs_result structure combines a status code with a ref map,
which can be NULL even on success. This will be needed when
there's a tweak-fetch hook, because it can filter out all refs,
while still succeeding.

fetch_refs returns a refs_result, so that it can modify the ref_map.
---
 builtin/fetch.c |   54 +++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 33ad3aa..70b9f89 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,11 @@ enum {
 	TAGS_SET = 2
 };
 
+struct refs_result {
+	struct ref *new_refs;
+	int status;
+};
+
 static int all, append, dry_run, force, keep, multiple, prune, update_head_ok, verbosity;
 static int progress, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT;
@@ -89,6 +94,15 @@ static struct option builtin_fetch_options[] = {
 	OPT_END()
 };
 
+static int add_existing(const char *refname, const unsigned char *sha1,
+			int flag, void *cbdata)
+{
+	struct string_list *list = (struct string_list *)cbdata;
+	struct string_list_item *item = string_list_insert(list, refname);
+	item->util = (void *)sha1;
+	return 0;
+}
+
 static void unlock_pack(void)
 {
 	if (transport)
@@ -507,17 +521,24 @@ static int quickfetch(struct ref *ref_map)
 	return check_everything_connected(iterate_ref_map, 1, &rm);
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static struct refs_result fetch_refs(struct transport *transport,
+		struct ref *ref_map)
 {
-	int ret = quickfetch(ref_map);
-	if (ret)
-		ret = transport_fetch_refs(transport, ref_map);
-	if (!ret)
-		ret |= store_updated_refs(transport->url,
+	struct refs_result res;
+	res.status = quickfetch(ref_map);
+	if (res.status)
+		res.status = transport_fetch_refs(transport, ref_map);
+	if (!res.status) {
+		res.new_refs = ref_map
+		res.status |= store_updated_refs(transport->url,
 				transport->remote->name,
-				ref_map);
+				res.new_refs);
+	}
+	else {
+		res.new_refs = ref_map;
+	}
 	transport_unlock_pack(transport);
-	return ret;
+	return res;
 }
 
 static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map)
@@ -542,15 +563,6 @@ static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map)
 	return result;
 }
 
-static int add_existing(const char *refname, const unsigned char *sha1,
-			int flag, void *cbdata)
-{
-	struct string_list *list = (struct string_list *)cbdata;
-	struct string_list_item *item = string_list_insert(list, refname);
-	item->util = (void *)sha1;
-	return 0;
-}
-
 static int will_fetch(struct ref **head, const unsigned char *sha1)
 {
 	struct ref *rm = *head;
@@ -673,6 +685,7 @@ static int do_fetch(struct transport *transport,
 	struct string_list_item *peer_item = NULL;
 	struct ref *ref_map;
 	struct ref *rm;
+	struct refs_result res;
 	int autotags = (transport->remote->fetch_tags == 1);
 
 	for_each_ref(add_existing, &existing_refs);
@@ -710,7 +723,9 @@ static int do_fetch(struct transport *transport,
 
 	if (tags == TAGS_DEFAULT && autotags)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
-	if (fetch_refs(transport, ref_map)) {
+	res = fetch_refs(transport, ref_map);
+	ref_map = res.new_refs;
+	if (res.status) {
 		free_refs(ref_map);
 		return 1;
 	}
@@ -750,7 +765,8 @@ static int do_fetch(struct transport *transport,
 		if (ref_map) {
 			transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 			transport_set_option(transport, TRANS_OPT_DEPTH, "0");
-			fetch_refs(transport, ref_map);
+			res = fetch_refs(transport, ref_map);
+			ref_map = res.new_refs;
 		}
 		free_refs(ref_map);
 	}
-- 
1.7.7.3


[-- Attachment #1.3: 0002-add-tweak-fetch-hook.patch --]
[-- Type: text/x-diff, Size: 7599 bytes --]

From 073b0921bb5988628e7af423924c410f522f403a Mon Sep 17 00:00:00 2001
From: Joey Hess <joey@kitenet.net>
Date: Mon, 26 Dec 2011 10:53:27 -0400
Subject: [PATCH 2/2] add tweak-fetch hook

The tweak-fetch hook is fed lines on stdin for all refs that were fetched,
and outputs on stdout possibly modified lines. Its output is parsed and
used when git fetch updates the remote tracking refs, records the entries
in FETCH_HEAD, and produces its report.
---
 Documentation/githooks.txt |   29 +++++++
 builtin/fetch.c            |  191 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 219 insertions(+), 1 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 28edefa..be2624c 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -162,6 +162,35 @@ This hook can be used to perform repository validity checks, auto-display
 differences from the previous HEAD if different, or set working dir metadata
 properties.
 
+tweak-fetch
+~~~~~~~~~~
+
+This hook is invoked by 'git fetch' (commonly called by 'git pull'), after
+refs have been fetched from the remote repository. It is not executed, if
+nothing was fetched.
+
+The output of the hook is used to update the remote-tracking branches, and
+`.git/FETCH_HEAD`, in preparation for for a later merge operation done by
+'git merge'.
+
+It takes no arguments, but is fed a line of the following format on
+its standard input for each ref that was fetched.
+
+  <sha1> SP not-for-merge|merge SP <remote-refname> SP <local-refname> LF
+
+Where the "not-for-merge" flag indicates the ref is not to be merged into the
+current branch, and the "merge" flag indicates that 'git merge' should
+later merge it. The `<remote-refname>` is the remote's name for the ref
+that was pulled, and `<local-refname>` is a name of a remote-tracking branch,
+like "refs/remotes/origin/master", or can be empty if the fetched ref is not
+being stored in a local refname.
+
+The hook must consume all of its standard input, and output back lines
+of the same format. It can modify its input as desired, including
+adding or removing lines, updating the sha1 (i.e. re-point the
+remote-tracking branch), changing the merge flag, and changing the
+`<local-refname>` (i.e. use different remote-tracking branch).
+
 post-merge
 ~~~~~~~~~~
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 70b9f89..5434b6f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -103,6 +103,194 @@ static int add_existing(const char *refname, const unsigned char *sha1,
 	return 0;
 }
 
+static const char tweak_fetch_hook[] = "tweak-fetch";
+
+int feed_tweak_fetch_hook (int in, int out, void *data)
+{
+	struct ref *ref;
+	struct strbuf buf = STRBUF_INIT;
+	int ret;
+
+	for (ref = data; ref; ref = ref->next) {
+		strbuf_addstr(&buf, sha1_to_hex(ref->old_sha1));
+		strbuf_addch(&buf, ' ');
+		strbuf_addstr(&buf, ref->merge ? "merge" : "not-for-merge");
+		strbuf_addch(&buf, ' ');
+		if (ref->name)
+			strbuf_addstr(&buf, ref->name);
+		strbuf_addch(&buf, ' ');
+		if (ref->peer_ref && ref->peer_ref->name)
+			strbuf_addstr(&buf, ref->peer_ref->name);
+		strbuf_addch(&buf, '\n');
+	}
+
+	ret = write_in_full(out, buf.buf, buf.len) != buf.len;
+	if (ret)
+		warning("%s hook failed to consume all its input",
+				tweak_fetch_hook);
+	close(out);
+	strbuf_release(&buf);
+	return ret;
+}
+	
+struct ref *parse_tweak_fetch_hook_line (char *l, 
+		struct string_list *existing_refs)
+{
+	struct ref *ref = NULL, *peer_ref = NULL;
+	struct string_list_item *peer_item = NULL;
+	char *words[4];
+	int i, word=0;
+	char *problem;
+
+	for (i=0; l[i]; i++) {
+		if (isspace(l[i])) {
+			l[i]='\0';
+			words[word]=l;
+			l+=i+1;
+			i=0;
+			word++;
+			if (word > 3) {
+				problem="too many words";
+				goto unparsable;
+			}
+		}
+	}
+	if (word < 3) {
+		problem="not enough words";
+		goto unparsable;
+	}
+	
+	ref = alloc_ref(words[2]);
+	peer_ref = ref->peer_ref = alloc_ref(l);
+	ref->peer_ref->force=1;
+
+	if (get_sha1_hex(words[0], ref->old_sha1)) {
+		problem="bad sha1";
+		goto unparsable;
+	}
+
+	if (strcmp(words[1], "merge") == 0) {
+		ref->merge=1;
+	}
+	else if (strcmp(words[1], "not-for-merge") != 0) {
+		problem="bad merge flag";
+		goto unparsable;
+	}
+
+	peer_item = string_list_lookup(existing_refs, peer_ref->name);
+	if (peer_item)
+		hashcpy(peer_ref->old_sha1, peer_item->util);
+
+	return ref;
+
+ unparsable:
+	warning("%s hook output a wrongly formed line: %s",
+			tweak_fetch_hook, problem);
+	free(ref);
+	free(peer_ref);
+	return NULL;
+}
+
+struct refs_result read_tweak_fetch_hook (int in) {
+	struct refs_result res;
+	FILE *f;
+	struct strbuf buf;
+	struct string_list existing_refs = STRING_LIST_INIT_NODUP;
+	struct ref *ref, *prevref=NULL;
+
+	res.status = 0;
+	res.new_refs = NULL;
+
+	f = fdopen(in, "r");
+	if (f == NULL) {
+		res.status = 1;
+		return res;
+	}
+
+	strbuf_init(&buf, 128);
+	for_each_ref(add_existing, &existing_refs);
+
+	while (strbuf_getline(&buf, f, '\n') != EOF) {
+		char *l = strbuf_detach(&buf, NULL);
+		ref = parse_tweak_fetch_hook_line(l, &existing_refs);
+		if (ref) {
+			if (prevref) {
+				prevref->next=ref;
+				prevref=ref;
+			}
+			else {
+				res.new_refs = prevref = ref;
+			}
+		}
+		else {
+			res.status = 1;
+		}
+		free(l);
+	}
+
+	string_list_clear(&existing_refs, 0);
+	strbuf_release(&buf);
+	fclose(f);
+	return res;
+}
+
+/* The hook is fed lines of the form:
+ * <sha1> SP <not-for-merge|merge> SP <remote-refname> SP <local-refname> LF
+ * And should output rewritten lines of the same form.
+ */
+struct ref *run_tweak_fetch_hook (struct ref *fetched_refs)
+{
+	struct child_process hook;
+	const char *argv[2];
+	struct async async;
+	struct refs_result res;
+
+	if (! fetched_refs)
+		return fetched_refs;
+
+	argv[0] = git_path("hooks/%s", tweak_fetch_hook);
+	if (access(argv[0], X_OK) < 0)
+		return fetched_refs;
+	argv[1] = NULL;
+
+	memset(&hook, 0, sizeof(hook));
+	hook.argv = argv;
+	hook.in = -1;
+	hook.out = -1;
+	if (start_command(&hook))
+		return fetched_refs;
+
+	/* Use an async writer to feed the hook process.
+	 * This allows the hook to read and write a line at
+	 * a time without blocking. */
+	memset(&async, 0, sizeof(async));
+	async.proc = feed_tweak_fetch_hook;
+	async.data = fetched_refs;
+	async.out = hook.in;
+	if (start_async(&async)) {
+		close(hook.in);
+		close(hook.out);
+		finish_command(&hook);
+		return fetched_refs;
+	}
+	res = read_tweak_fetch_hook(hook.out);
+	res.status |= finish_async(&async);
+	res.status |= finish_command(&hook);
+
+	if (res.status) {
+		warning("%s hook failed, ignoring its output", tweak_fetch_hook);
+		free(res.new_refs);
+		return fetched_refs;
+	}
+	else {
+		/* The new_refs are returned, to be used in place of
+		 * fetched_refs, so it is not needed anymore and can
+		 * be freed here. */
+		free_refs(fetched_refs);
+		return res.new_refs;
+	}
+}
+
 static void unlock_pack(void)
 {
 	if (transport)
@@ -529,7 +717,8 @@ static struct refs_result fetch_refs(struct transport *transport,
 	if (res.status)
 		res.status = transport_fetch_refs(transport, ref_map);
 	if (!res.status) {
-		res.new_refs = ref_map
+		res.new_refs = run_tweak_fetch_hook(ref_map);
+
 		res.status |= store_updated_refs(transport->url,
 				transport->remote->name,
 				res.new_refs);
-- 
1.7.7.3


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-26 15:51           ` Joey Hess
@ 2011-12-27  6:37             ` Junio C Hamano
  2011-12-27 15:49               ` Joey Hess
  2011-12-27 23:23             ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2011-12-27  6:37 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> .... And other code in git uses an async feeder similarly,
> see for example convert.c's apply_filter(). So I think this is ok..?

Yeah, I didn't look at your patch (sorry) but if it uses async like the
filtering codepath does, it should be perfectly fine (please forget about
the select(2) based kludge I alluded to; the async interface is the right
thing to use here).

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-27  6:37             ` Junio C Hamano
@ 2011-12-27 15:49               ` Joey Hess
  0 siblings, 0 replies; 16+ messages in thread
From: Joey Hess @ 2011-12-27 15:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

Junio C Hamano wrote:
> Joey Hess <joey@kitenet.net> writes:
> 
> > .... And other code in git uses an async feeder similarly,
> > see for example convert.c's apply_filter(). So I think this is ok..?
> 
> Yeah, I didn't look at your patch (sorry) but if it uses async like the
> filtering codepath does, it should be perfectly fine (please forget about
> the select(2) based kludge I alluded to; the async interface is the right
> thing to use here).

No problem, I was surprised to be getting responses at all over the
holidays. :)

Then async also seems the right thing to use for the hook refactoring. A
caller can provide two function pointers; a feeder function that is
called async, and a reader that is *not* called async (which would allow
it to modify program state), and the refactored hook function handles
running the hook(s) and connecting them to the feeder and/or reader.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-26  2:31       ` Joey Hess
  2011-12-26  7:59         ` Junio C Hamano
@ 2011-12-27 21:27         ` Johannes Sixt
  2011-12-28 19:30           ` Joey Hess
  1 sibling, 1 reply; 16+ messages in thread
From: Johannes Sixt @ 2011-12-27 21:27 UTC (permalink / raw)
  To: Joey Hess; +Cc: Junio C Hamano, git

Am 26.12.2011 03:31, schrieb Joey Hess:
> +int feed_post_fetch_hook (int in, int out, void *data)
> +{
> +	struct ref *ref;
> +	struct strbuf buf = STRBUF_INIT;

Is there a particular reason that you accumulate everything in a buffer?

If I read the loop below correctly, you should be able to run it using
only the functions sha1_to_hex(), strlen() and write_in_full(). This
would avoid any problems with concurrent calls to xmalloc().

> +	int ret;
> +
> +	for (ref = post_fetch_hook_refs; ref; ref = ref->next) {
> +		strbuf_addstr(&buf, sha1_to_hex(ref->old_sha1));

sha1_to_hex() works with a static buffer. Are you certain that it is not
called concurrently in the main thread?

> +		strbuf_addch(&buf, ' ');
> +		strbuf_addstr(&buf, ref->merge ? "merge" : "not-for-merge");
> +		strbuf_addch(&buf, ' ');
> +		if (ref->name)
> +			strbuf_addstr(&buf, ref->name);
> +		strbuf_addch(&buf, ' ');
> +		if (ref->peer_ref && ref->peer_ref->name)
> +			strbuf_addstr(&buf, ref->peer_ref->name);
> +		strbuf_addch(&buf, '\n');
> +	}

-- Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-26 15:51           ` Joey Hess
  2011-12-27  6:37             ` Junio C Hamano
@ 2011-12-27 23:23             ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2011-12-27 23:23 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> From 073b0921bb5988628e7af423924c410f522f403a Mon Sep 17 00:00:00 2001
> From: Joey Hess <joey@kitenet.net>
> Date: Mon, 26 Dec 2011 10:53:27 -0400
> Subject: [PATCH 2/2] add tweak-fetch hook
>
> The tweak-fetch hook is fed lines on stdin for all refs that were fetched,
> and outputs on stdout possibly modified lines. Its output is parsed and
> used when git fetch updates the remote tracking refs, records the entries
> in FETCH_HEAD, and produces its report.
> ---

Just a few style things, as this is not a signed-off patch yet.

> @@ -162,6 +162,35 @@ This hook can be used to perform repository validity checks, auto-display
>  differences from the previous HEAD if different, or set working dir metadata
>  properties.
>  
> +tweak-fetch
> +~~~~~~~~~~

The underline does not match what is being underlined. Does this format well?

> +This hook is invoked by 'git fetch' (commonly called by 'git pull'), after
> +refs have been fetched from the remote repository. It is not executed, if
> +nothing was fetched.
> +
> +The output of the hook is used to update the remote-tracking branches, and
> +`.git/FETCH_HEAD`, in preparation for for a later merge operation done by
> +'git merge'.
> +
> +It takes no arguments, but is fed a line of the following format on
> +its standard input for each ref that was fetched.
> +
> +  <sha1> SP not-for-merge|merge SP <remote-refname> SP <local-refname> LF
> +
> +Where the "not-for-merge" flag indicates the ref is not to be merged into the
> +current branch, and the "merge" flag indicates that 'git merge' should
> +later merge it. The `<remote-refname>` is the remote's name for the ref
> +that was pulled, and `<local-refname>` is a name of a remote-tracking branch,

s/pulled/fetched/; I think. The remainder of the new text seems to use the
right terminology.

> +int feed_tweak_fetch_hook (int in, int out, void *data)

No SP between function name and the opening parenthesis of its parameter
list. We have SP after control-flow keywords e.g. "for (;;)" though.

Does this name need to be external (same question to many other new
functions in this patch)?

The "in" parameter seems unused. Does it have to be there for the "feed"
callback of the generic hook driver? As long as it is the "feed" callback,
I think that it just needs to take "out" and no "in", no?

> +	for (ref = data; ref; ref = ref->next) {
> +		strbuf_addstr(&buf, sha1_to_hex(ref->old_sha1));
> +		strbuf_addch(&buf, ' ');
> +		strbuf_addstr(&buf, ref->merge ? "merge" : "not-for-merge");
> +		strbuf_addch(&buf, ' ');

strbuf_addf()?

But this might be a moot point, as J6t seems to have valid worries on
running functions that allocate memory in general...

> +	ret = write_in_full(out, buf.buf, buf.len) != buf.len;
> +	if (ret)
> +		warning("%s hook failed to consume all its input",
> +				tweak_fetch_hook);
> +	close(out);

I was hoping that this part would be part of more generic hook driver
infrastructure. Even if we were to take this series before we refactor
existing other hook drivers, in order to avoid duplicated work later, we
could at least start from a right implementation of a generic hook driver
with a single user (which is the "tweak-fetch" hook driver), no?

> +struct ref *parse_tweak_fetch_hook_line (char *l, 
> +		struct string_list *existing_refs)
> +{
> +	struct ref *ref = NULL, *peer_ref = NULL;
> +	struct string_list_item *peer_item = NULL;
> +	char *words[4];
> +	int i, word=0;

SP around assingment and initialization "var = val" (throughout this
patch).

> +	char *problem;
> +
> +	for (i=0; l[i]; i++) {

Likewise.

> +		if (isspace(l[i])) {
> +			l[i]='\0';
> +			words[word]=l;
> +			l+=i+1;
> +			i=0;
> +			word++;
> +			if (word > 3) {
> +				problem="too many words";
> +				goto unparsable;
> +			}
> +		}
> +	}
> +	if (word < 3) {
> +		problem="not enough words";
> +		goto unparsable;
> +	}

Perhaps loop for up-to ARRAY_SIZE(words) times and use strchr()?

> +	if (strcmp(words[1], "merge") == 0) {

We tend to say "if (!strcmp(...))" instead.

> +		ref->merge=1;
> +	}
> +	else if (strcmp(words[1], "not-for-merge") != 0) {

Likewise.

> +struct refs_result read_tweak_fetch_hook (int in) {

Opening brace at column 1 of the next line.

> +			if (prevref) {
> +				prevref->next=ref;
> +				prevref=ref;
> +			}
> +			else {

	if (...) {
		...
	} else {
		...
	}

> +/* The hook is fed lines of the form:
> + * <sha1> SP <not-for-merge|merge> SP <remote-refname> SP <local-refname> LF
> + * And should output rewritten lines of the same form.
> + */

	/*
         * We write our multi-line comments
         * like this (applies to a few other comments
         * in this patch).
         */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] add post-fetch hook
  2011-12-27 21:27         ` Johannes Sixt
@ 2011-12-28 19:30           ` Joey Hess
  0 siblings, 0 replies; 16+ messages in thread
From: Joey Hess @ 2011-12-28 19:30 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 3821 bytes --]

Johannes Sixt wrote:
> If I read the loop below correctly, you should be able to run it using
> only the functions sha1_to_hex(), strlen() and write_in_full(). This
> would avoid any problems with concurrent calls to xmalloc().
> 
> > +	int ret;
> > +
> > +	for (ref = post_fetch_hook_refs; ref; ref = ref->next) {
> > +		strbuf_addstr(&buf, sha1_to_hex(ref->old_sha1));
> 
> sha1_to_hex() works with a static buffer. Are you certain that it is not
> called concurrently in the main thread?

Thanks very much for pointing that thread-unsafty out.

Based on that, my current thinking for a generic hook interface is that,
rather than the caller providing an arbitrary "feeder" function that gets
run async, the caller should provide a function that generates a strbuf
containing the stdout for the hook, and then a very simple async writer
can handle the actual writing.

static int feed_hook(int in, int out, void *data)
{
        struct strbuf *buf = data;
        return write_in_full(out, buf->buf, buf->len) != buf->len;
}

(I assume that write_in_full is safe to be run async?)

I am working on a patch that will involve adding a hook_complex()
and changing hook() to be implemented in terms of it. The header
for that is included below, you should get a very good idea of how
it will work from the data structure.

/*
 * This data structure controls how a hook is run.
 */
struct hook {
	/* The name of the hook being run. */
	const char *name;
	/* Parameters to pass to the hook program, not including the name
	 * of the hook. May be NULL. */
	struct argv_array *argv_array;
	/* Pathname to an index file to use, or NULL if the hook
	 * uses the default index file or no index is needed. */
	const char *index_file;
	/*
	 * An arbitrary data structure, can be populated and modified to
	 * communicate between the feeder, reader, and caller of the hook.
	 */
	void *data;
	/* 
	 * A hook can optionally not consume all of its stdin.
	 * If partial_stdin is 0, it is an error for some stdin not
	 * to be consumed.
	 */
	int partial_stdin;
	/* 
	 * feeder populates a strbuf with the content to send to the
	 * hook on its standard input.
	 *
	 * May be NULL, if the hook does not consume standard input.
	 *
	 * Note that feeder might be run more than once, if multiple
	 * programs are run as part of a single hook. It should avoid
	 * taking any actions except for reading from data and generating
	 * the strbuf. It will *not* be run async, and need not worry
	 * about contending with other threads.
	 */
	struct strbuf *(*feeder)(struct hook *hook);
	/*
	 * reader processes the hook's standard output from the handle,
	 * returning 0 on success, non-zero on failure.
	 *
	 * May be NULL, if the hook's stdin is not processed. (It will
	 * instead be redirected to stderr.)
	 *
	 * Note that reader might be run more than once, if multiple
	 * programs are run as part of a single hook. It should avoid
	 * taking any actions except for reading from the input handle,
	 * changing the content of data, and printing any necessary
	 * warnings. It will *not* be run async, and need not worry
	 * about contending with other threads.
	 */
	int (*reader)(struct hook *hook, int handle);
};

extern int run_hook(const char *index_file, const char *name, ...);

extern int run_hook_complex(struct hook *hook);


This design allows for a future where multiple scripts get run for a
single hook. In that case, the feeder and reader functions would get
called repeatedly in a loop, with a data flow like this, where the
reader modifies hook.data, providing the next call of the feeder with
the new data read from the hook script:
    feeder | hook_script_1 | reader | feeder | hook_script_2 | reader

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-28 19:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-24 23:42 [PATCH] add post-fetch hook Joey Hess
2011-12-25  3:13 ` Junio C Hamano
2011-12-25  3:50   ` Joey Hess
2011-12-25  9:30     ` Junio C Hamano
2011-12-25 16:24       ` Jakub Narebski
2011-12-25 18:06         ` Joey Hess
2011-12-26  8:09         ` Junio C Hamano
2011-12-25 17:54       ` Joey Hess
2011-12-26  2:31       ` Joey Hess
2011-12-26  7:59         ` Junio C Hamano
2011-12-26 15:51           ` Joey Hess
2011-12-27  6:37             ` Junio C Hamano
2011-12-27 15:49               ` Joey Hess
2011-12-27 23:23             ` Junio C Hamano
2011-12-27 21:27         ` Johannes Sixt
2011-12-28 19:30           ` Joey Hess

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.