All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Enable parallelism in git submodule update.
@ 2012-07-27 18:37 Stefan Zager
  2012-07-27 21:38 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Stefan Zager @ 2012-07-27 18:37 UTC (permalink / raw)
  To: git; +Cc: gitster, jens.lehmann, hvoigt

The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager <szager@google.com>
---
 Documentation/git-submodule.txt |  8 +++++++-
 git-submodule.sh                | 23 ++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index fbbbcb2..34f81fb 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
 'git submodule' [--quiet] init [--] [<path>...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
-	      [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+	      [--reference <repository>] [--merge] [--recursive]
+	      [-j|--jobs [jobs]] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -147,6 +148,11 @@ If the submodule is not yet initialized, and you just want to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 
diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..761420a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
    or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
    or: $dashless [--quiet] init [--] [<path>...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
    or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
    or: $dashless [--quiet] foreach [--recursive] <command>
    or: $dashless [--quiet] sync [--] [<path>...]"
@@ -473,6 +473,7 @@ cmd_update()
 {
 	# parse $args after "submodule ... update".
 	orig_flags=
+	jobs="1"
 	while test $# -ne 0
 	do
 		case "$1" in
@@ -491,6 +492,20 @@ cmd_update()
 		-r|--rebase)
 			update="rebase"
 			;;
+		-j|--jobs)
+			case "$2" in
+			''|-*)
+				jobs="0"
+				;;
+			*)
+				jobs="$2"
+				shift
+				;;
+			esac
+			# Don't preserve this arg.
+			shift
+			continue
+			;;
 		--reference)
 			case "$2" in '') usage ;; esac
 			reference="--reference=$2"
@@ -529,6 +544,12 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
+	if test "$jobs" != "1"
+	then
+		module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
+		return
+	fi
+
 	cloned_modules=
 	module_list "$@" | {
 	err=
-- 
1.7.11.rc2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
@ 2012-07-27 21:38 ` Junio C Hamano
       [not found]   ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
  2012-07-28 10:22 ` Heiko Voigt
  2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-07-27 21:38 UTC (permalink / raw)
  To: Stefan Zager; +Cc: git, jens.lehmann, hvoigt

Stefan Zager <szager@google.com> writes:

> +		module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args

Capital-P option to xargs is not even in POSIX, no?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
       [not found]   ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
@ 2012-07-27 23:25     ` Junio C Hamano
  2012-07-28 10:52       ` Heiko Voigt
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-07-27 23:25 UTC (permalink / raw)
  To: Stefan Zager; +Cc: git, jens.lehmann, hvoigt

Stefan Zager <szager@google.com> writes:

> On Fri, Jul 27, 2012 at 2:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>> Stefan Zager <szager@google.com> writes:
>>
>> > +             module_list "$@" | awk '{print $4}' | xargs -L 1 -P
>> "$jobs" git submodule update $orig_args
>>
>> Capital-P option to xargs is not even in POSIX, no?
>
> I wasn't aware of that, but you appear to be correct.  Don't know if you
> have a policy about that, but anecdotally, -P is supported on my linux,
> mac, and win/msys systems.

About "policy", we use POSIX as a rough yardstick to warn us that we
might be breaking people on minority platforms.  We do _not_ say "It
is in POSIX, so it is safe to use it", but we say "It is not even in
POSIX, so we need to think twice."  We do not usually say "Linux,
Mac and Windows are the only things that matter, and they all
support it."

Of course, any set of rules have exceptions ;-) There are a few
things to which we say "Even though it is not in POSIX, everybody
who matters supports it, and without taking advantage of it, what we
want to achieve will become too cumbersome to express".

In the core parts of the system, we try to be very conservative. In
the fringe where nobody cares about, we tend to be looser.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
  2012-07-27 21:38 ` Junio C Hamano
@ 2012-07-28 10:22 ` Heiko Voigt
  2012-07-28 12:19   ` [PATCH] cleanup argument passing in submodule status command Heiko Voigt
  2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
  2 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2012-07-28 10:22 UTC (permalink / raw)
  To: Stefan Zager; +Cc: git, gitster, jens.lehmann

Hi Stefan,

neat patch. See below for a few notes.

On Fri, Jul 27, 2012 at 11:37:34AM -0700, Stefan Zager wrote:
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dba4d39..761420a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -491,6 +492,20 @@ cmd_update()
>  		-r|--rebase)
>  			update="rebase"
>  			;;
> +		-j|--jobs)
> +			case "$2" in
> +			''|-*)
> +				jobs="0"
> +				;;
> +			*)
> +				jobs="$2"
> +				shift
> +				;;
> +			esac
> +			# Don't preserve this arg.
> +			shift
> +			continue
> +			;;
>  		--reference)
>  			case "$2" in '') usage ;; esac
>  			reference="--reference=$2"
> @@ -529,6 +544,12 @@ cmd_update()
>  		cmd_init "--" "$@" || return
>  	fi
>  
> +	if test "$jobs" != "1"
> +	then
> +		module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args

I do not see orig_args set anywhere in submodule.sh. It seems the
existing usage of it in cmd_status() is a leftover from commit
98dbe63 when this variable got renamed to orig_flags.

I will follow up with a patch to that location.

Another problem here is the passing of arguments. Have a look at
a7eff1a8 to see how this was solved for other locations.

The next thing I noticed is that the parallelism is not recursive. You
drop the option and only execute the first depth in parallel. How about
using the amount of modules defined by arguments left in $@ as an
indicator whether you need to fork parallel execution or not. If there
is exactly one you do the update if there are more you do the parallel
thing. That way you can just keep passing the --jobs flag to the
subprocesses.

The next question to solve is UI: Since the output lines of the parallel
update jobs will be mixed we need some way to distinguish them. Imagine
one of the update fails somewhere how do we find out which it was?

Two possible solutions come to my mind:

 1. Prefix each line with a job number. This way you can distinguish
    which process outputted what and still have immediate feedback.

 2. Cache the output (to stderr and stdout) of each job and output it
    once one job is done. I imagine this needs some infrastructure which
    we need to implement. We already have some ideas how to collect such
    output in C here[1].

I would prefer solution 2 since the output of 1 will be hard to read but
I guess we could start with 1 and then move over to 2 later on.

Cheers Heiko

[1] http://article.gmane.org/gmane.comp.version-control.git/197747

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-07-27 23:25     ` Junio C Hamano
@ 2012-07-28 10:52       ` Heiko Voigt
  2012-07-29 21:59         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2012-07-28 10:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Zager, git, jens.lehmann

Hi,

On Fri, Jul 27, 2012 at 04:25:58PM -0700, Junio C Hamano wrote:
> Stefan Zager <szager@google.com> writes:
> 
> > On Fri, Jul 27, 2012 at 2:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
> >
> >> Stefan Zager <szager@google.com> writes:
> >>
> >> > +             module_list "$@" | awk '{print $4}' | xargs -L 1 -P
> >> "$jobs" git submodule update $orig_args
> >>
> >> Capital-P option to xargs is not even in POSIX, no?
> >
> > I wasn't aware of that, but you appear to be correct.  Don't know if you
> > have a policy about that, but anecdotally, -P is supported on my linux,
> > mac, and win/msys systems.
> 
> About "policy", we use POSIX as a rough yardstick to warn us that we
> might be breaking people on minority platforms.  We do _not_ say "It
> is in POSIX, so it is safe to use it", but we say "It is not even in
> POSIX, so we need to think twice."  We do not usually say "Linux,
> Mac and Windows are the only things that matter, and they all
> support it."
> 
> Of course, any set of rules have exceptions ;-) There are a few
> things to which we say "Even though it is not in POSIX, everybody
> who matters supports it, and without taking advantage of it, what we
> want to achieve will become too cumbersome to express".

I was about to write that since this is limited to a given --jobs
options the majority platforms should be enough as a start and others
could add a parallelism mechanism later. Its only a matter of efficiency
and not features.

But if you look at my other post to this thread I described that we need
some UI output extension so the user can still make sense of it.
In short: The user should be able distinguish which job said what.

I was already thinking about how an output caching could be implemented in
core git. How about exposing it as a git command like this?

	git run [-j<number>] ...

It works like the xargs call above except that it caches each jobs
output to stderr and stdout until its done and then replays the output
to stderr/out in the correct order.

We could design the code so that it can be reused later on to do the
caching in parallel fetch/push/... .

What do you think? If we decide to go this route I would have a look
into whipping something up.

Cheers Heiko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] cleanup argument passing in submodule status command
  2012-07-28 10:22 ` Heiko Voigt
@ 2012-07-28 12:19   ` Heiko Voigt
  2012-07-29  6:22     ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Heiko Voigt @ 2012-07-28 12:19 UTC (permalink / raw)
  To: gitster; +Cc: git, jens.lehmann, Stefan Zager

In commit 98dbe63 the variable $orig_args was renamed to $orig_flags.
One location in cmd_status() was missed.

Note: This is a code cleanup and does not fix any bugs. As a side effect
the variables containing the parsed flags to "git submodule status" are
passed down recursively. So everything was already behaving as expected.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..3a3f0a4 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -961,7 +961,7 @@ cmd_status()
 				prefix="$displaypath/"
 				clear_local_git_env
 				cd "$sm_path" &&
-				eval cmd_status "$orig_args"
+				eval cmd_status "$orig_flags"
 			) ||
 			die "$(eval_gettext "Failed to recurse into submodule path '\$sm_path'")"
 		fi
-- 
1.7.12.rc0.23.g3c7cae0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] cleanup argument passing in submodule status command
  2012-07-28 12:19   ` [PATCH] cleanup argument passing in submodule status command Heiko Voigt
@ 2012-07-29  6:22     ` Junio C Hamano
  2012-07-29 15:29       ` Jens Lehmann
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-07-29  6:22 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: git, jens.lehmann, Stefan Zager

Heiko Voigt <hvoigt@hvoigt.net> writes:

> Note: This is a code cleanup and does not fix any bugs. As a side effect
> the variables containing the parsed flags to "git submodule status" are
> passed down recursively. So everything was already behaving as expected.

If that is the case, shouldn't we stop passing anything down, if we
want it to be a "clean-up only, no behaviour changes" patch?  While
at it, we may want to kill that code to accumulate the original
options in orig_flags because we haven't been using the variable.

We _know_ $orig_args has been empty, i.e. the code has been working
fine with only cmd_status there.  Nobody has tried what happens when
we pass the original arguments to cmd_status on that line.  The
patch changes the behaviour of the code; it makes the command line
parsing "while" loop to run again, and if the code that accumulates
original options in orig_flags have been buggy, now that bug will be
exposed.




> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
> ---
>  git-submodule.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dba4d39..3a3f0a4 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -961,7 +961,7 @@ cmd_status()
>  				prefix="$displaypath/"
>  				clear_local_git_env
>  				cd "$sm_path" &&
> -				eval cmd_status "$orig_args"
> +				eval cmd_status "$orig_flags"
>  			) ||
>  			die "$(eval_gettext "Failed to recurse into submodule path '\$sm_path'")"
>  		fi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cleanup argument passing in submodule status command
  2012-07-29  6:22     ` Junio C Hamano
@ 2012-07-29 15:29       ` Jens Lehmann
  2012-07-29 21:57         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Lehmann @ 2012-07-29 15:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Heiko Voigt, git, Stefan Zager

Am 29.07.2012 08:22, schrieb Junio C Hamano:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
>> Note: This is a code cleanup and does not fix any bugs. As a side effect
>> the variables containing the parsed flags to "git submodule status" are
>> passed down recursively. So everything was already behaving as expected.
> 
> If that is the case, shouldn't we stop passing anything down, if we
> want it to be a "clean-up only, no behaviour changes" patch?  While
> at it, we may want to kill that code to accumulate the original
> options in orig_flags because we haven't been using the variable.
> 
> We _know_ $orig_args has been empty, i.e. the code has been working
> fine with only cmd_status there.  Nobody has tried what happens when
> we pass the original arguments to cmd_status on that line.

I tried today. Before this change no arguments got passed down and
afterwards they are (but just the arguments, no submodule paths
were passed on in either case; which is what Kevin fixed in the
commit Heiko referenced). Three arguments are allowed for "git
submodule status":

--recursive:
It doesn't matter if we pass that on or not because $recursive is
reused when "eval cmd_status" is executed.

--quiet:
Same as recursive, GIT_QUIET is set the first time and then reused
in the recursion.

--cached:
This was dropped when recursing into submodules but isn't anymore
with Heiko's change, so we do have a change in behavior here.

>  The
> patch changes the behaviour of the code; it makes the command line
> parsing "while" loop to run again, and if the code that accumulates
> original options in orig_flags have been buggy, now that bug will be
> exposed.

Hmm, when --cached is used together with --recursive, I would expect
it to show the commit stored in the index for the deeper submodules
too (and not magically switch to show their HEAD again after the
first level of submodules). To me this looks like a bug which Kevin
accidentally introduced and nobody noticed and/or reported until now.

So I'd vote for making this a bugfix patch for "git submodule status
--cached --recursive" (and would love to see a test for it ;-).

>> Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
>> ---
>>  git-submodule.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index dba4d39..3a3f0a4 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -961,7 +961,7 @@ cmd_status()
>>  				prefix="$displaypath/"
>>  				clear_local_git_env
>>  				cd "$sm_path" &&
>> -				eval cmd_status "$orig_args"
>> +				eval cmd_status "$orig_flags"
>>  			) ||
>>  			die "$(eval_gettext "Failed to recurse into submodule path '\$sm_path'")"
>>  		fi
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
  2012-07-27 21:38 ` Junio C Hamano
  2012-07-28 10:22 ` Heiko Voigt
@ 2012-07-29 15:37 ` Jens Lehmann
  2012-11-03 19:07   ` Jens Lehmann
  2 siblings, 1 reply; 19+ messages in thread
From: Jens Lehmann @ 2012-07-29 15:37 UTC (permalink / raw)
  To: Stefan Zager; +Cc: git, gitster, hvoigt

Am 27.07.2012 20:37, schrieb Stefan Zager:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.

I think this is a sound idea, but it would be good to see some
actual measurements. What are the performance numbers with and
without this change? Which cases do benefit and are there some
which run slower when run in parallel?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cleanup argument passing in submodule status command
  2012-07-29 15:29       ` Jens Lehmann
@ 2012-07-29 21:57         ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-07-29 21:57 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Heiko Voigt, git, Stefan Zager

Jens Lehmann <Jens.Lehmann@web.de> writes:

> I tried today. Before this change no arguments got passed down and
> afterwards they are (but just the arguments, no submodule paths
> were passed on in either case; which is what Kevin fixed in the
> commit Heiko referenced). Three arguments are allowed for "git
> submodule status":
>
> --recursive:
> It doesn't matter if we pass that on or not because $recursive is
> reused when "eval cmd_status" is executed.
>
> --quiet:
> Same as recursive, GIT_QUIET is set the first time and then reused
> in the recursion.
>
> --cached:
> This was dropped when recursing into submodules but isn't anymore
> with Heiko's change, so we do have a change in behavior here.
> ...
> Hmm, when --cached is used together with --recursive, I would expect
> it to show the commit stored in the index for the deeper submodules
> too (and not magically switch to show their HEAD again after the
> first level of submodules). To me this looks like a bug which Kevin
> accidentally introduced and nobody noticed and/or reported until now.
>
> So I'd vote for making this a bugfix patch for "git submodule status
> --cached --recursive" (and would love to see a test for it ;-).

Yeah, I am not opposed to a "fix".  I just wanted it to be labelled
as such, and analysed correctly.

And with test ;-)

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-07-28 10:52       ` Heiko Voigt
@ 2012-07-29 21:59         ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-07-29 21:59 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Stefan Zager, git, jens.lehmann

Heiko Voigt <hvoigt@hvoigt.net> writes:

> On Fri, Jul 27, 2012 at 04:25:58PM -0700, Junio C Hamano wrote:
> ...
>> Of course, any set of rules have exceptions ;-) There are a few
>> things to which we say "Even though it is not in POSIX, everybody
>> who matters supports it, and without taking advantage of it, what we
>> want to achieve will become too cumbersome to express".
>
> I was about to write that since this is limited to a given --jobs
> options the majority platforms should be enough as a start and others
> could add a parallelism mechanism later. Its only a matter of efficiency
> and not features.

As long as "git submodule --jobs 9" on a platform without GNU
enhanced xargs does not error out and gracefully degrade to non
parallel execution, I do not have any problem with it.  As posted,
the patch has not yet achieved that doneness yet.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
@ 2012-11-03 19:07   ` Jens Lehmann
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Lehmann @ 2012-11-03 19:07 UTC (permalink / raw)
  To: Stefan Zager; +Cc: git, gitster, hvoigt

Am 29.07.2012 17:37, schrieb Jens Lehmann:
> Am 27.07.2012 20:37, schrieb Stefan Zager:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
> 
> I think this is a sound idea, but it would be good to see some
> actual measurements. What are the performance numbers with and
> without this change? Which cases do benefit and are there some
> which run slower when run in parallel?

ping?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-11-03 18:44     ` Phil Hord
@ 2012-11-03 19:13       ` Jens Lehmann
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Lehmann @ 2012-11-03 19:13 UTC (permalink / raw)
  To: Phil Hord; +Cc: Stefan Zager, git, Heiko Voigt, Junio C Hamano

Am 03.11.2012 19:44, schrieb Phil Hord:
> On Sat, Nov 3, 2012 at 11:42 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 30.10.2012 19:11, schrieb Stefan Zager:
>>> This is a refresh of a conversation from a couple of months ago.
>>>
>>> I didn't try to implement all the desired features (e.g., smart logic
>>> for passing a -j parameter to recursive submodule invocations), but I
>>> did address the one issue that Junio insisted on: the code makes a
>>> best effort to detect whether xargs supports parallel execution on the
>>> host platform, and if it doesn't, then it prints a warning and falls
>>> back to serial execution.
>>
>> I suspect not passing on --jobs recursively like you do here is the
>> right thing to do, as that would give exponential growth of jobs with
>> recursion depth, which makes no sense to me.
> 
> On the other hand, since $jobs is still defined when the recursive
> call to is made to 'eval cmd_update "$orig_flags"', I suspect the
> value *is* passed down recursively.

But for $jobs != 1 Stefan's code doesn't use eval cmd_update but
starts the submodule script again:

+                       xargs $max_lines -P "$jobs" git submodule update $orig_flags

That should get rid of the $jobs setting, or am I missing something?

>  Maybe $jobs should be manually
> reset before recursing -- unless it is "0" -- though I expect someone
> would feel differently if she had one submodule on level 1 and 10
> submodules on level 2.  She would be surprised, then, when  --jobs=10
> seemed to have little affect on performance.

Hmm, good point. However we implement that, it should at least be
properly documented in the man page (and in the use case you describe
a "git submodule foreach 'git submodule update -j 10'" could be the
solution if we choose to not propagate the jobs option).

>  So maybe it is best to
> leave it as it is, excepting that the apparent attempt not to pass the
> switch down is probably misleading.

I didn't test it, but I think it should work (famous last words ;-).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-11-03 15:42   ` Jens Lehmann
@ 2012-11-03 18:44     ` Phil Hord
  2012-11-03 19:13       ` Jens Lehmann
  0 siblings, 1 reply; 19+ messages in thread
From: Phil Hord @ 2012-11-03 18:44 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Stefan Zager, git, Heiko Voigt, Junio C Hamano

On Sat, Nov 3, 2012 at 11:42 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 30.10.2012 19:11, schrieb Stefan Zager:
>> This is a refresh of a conversation from a couple of months ago.
>>
>> I didn't try to implement all the desired features (e.g., smart logic
>> for passing a -j parameter to recursive submodule invocations), but I
>> did address the one issue that Junio insisted on: the code makes a
>> best effort to detect whether xargs supports parallel execution on the
>> host platform, and if it doesn't, then it prints a warning and falls
>> back to serial execution.
>
> I suspect not passing on --jobs recursively like you do here is the
> right thing to do, as that would give exponential growth of jobs with
> recursion depth, which makes no sense to me.

On the other hand, since $jobs is still defined when the recursive
call to is made to 'eval cmd_update "$orig_flags"', I suspect the
value *is* passed down recursively.  Maybe $jobs should be manually
reset before recursing -- unless it is "0" -- though I expect someone
would feel differently if she had one submodule on level 1 and 10
submodules on level 2.  She would be surprised, then, when  --jobs=10
seemed to have little affect on performance.  So maybe it is best to
leave it as it is, excepting that the apparent attempt not to pass the
switch down is probably misleading.

Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-10-30 18:11 ` Stefan Zager
  2012-11-02 21:49   ` Stefan Zager
@ 2012-11-03 15:42   ` Jens Lehmann
  2012-11-03 18:44     ` Phil Hord
  1 sibling, 1 reply; 19+ messages in thread
From: Jens Lehmann @ 2012-11-03 15:42 UTC (permalink / raw)
  To: Stefan Zager; +Cc: git, Heiko Voigt, Junio C Hamano

Am 30.10.2012 19:11, schrieb Stefan Zager:
> This is a refresh of a conversation from a couple of months ago.
> 
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.

I suspect not passing on --jobs recursively like you do here is the
right thing to do, as that would give exponential growth of jobs with
recursion depth, which makes no sense to me.

A still unsolved issue is the unstructured output from the different
update jobs. It'll be hard (if not impossible) to see in what submodule
which update took place (or failed). I think we should have a solution
for that too (maybe one of those Heiko mentioned or something as simple
as implying "-q"?).

> Stefan
> 
> On Tue, Oct 30, 2012 at 11:03 AM,  <szager@google.com> wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>
>> Signed-off-by: Stefan Zager <szager@google.com>
>> ---
>>  Documentation/git-submodule.txt |    8 ++++++-
>>  git-submodule.sh                |   40 ++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
>>  'git submodule' [--quiet] init [--] [<path>...]
>>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> -             [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> +             [--reference <repository>] [--merge] [--recursive]
>> +             [-j|--jobs [jobs]] [--] [<path>...]
>>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
>>               [commit] [--] [<path>...]
>>  'git submodule' [--quiet] foreach [--recursive] <command>
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
>>  setting as stored in .gitmodules, you can automatically initialize the
>>  submodule with the `--init` option.
>>  +
>> +By default, each submodule is treated serially.  You may specify a degree of
>> +parallel execution with the --jobs flag.  If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all jobs are
>> +run in parallel.
>> ++

The new "--jobs" option should be documented under "OPTIONS", (and maybe
include that "--jobs 0" does the same as "--jobs" alone and that this is
not supported on all platforms).

>>  If `--recursive` is specified, this command will recurse into the
>>  registered submodules, and update any nested submodules within.
>>  +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
>>     or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
>>     or: $dashless [--quiet] init [--] [<path>...]
>> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
>>     or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
>>     or: $dashless [--quiet] foreach [--recursive] <command>
>>     or: $dashless [--quiet] sync [--] [<path>...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>>  {
>>         # parse $args after "submodule ... update".
>>         orig_flags=
>> +       jobs="1"
>>         while test $# -ne 0
>>         do
>>                 case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>>                 -r|--rebase)
>>                         update="rebase"
>>                         ;;
>> +               -j|--jobs)
>> +                       case "$2" in
>> +                       ''|-*)
>> +                               jobs="0"
>> +                               ;;
>> +                       *)
>> +                               jobs="$2"
>> +                               shift
>> +                               ;;
>> +                       esac
>> +                       # Don't preserve this arg.
>> +                       shift
>> +                       continue
>> +                       ;;
>>                 --reference)
>>                         case "$2" in '') usage ;; esac
>>                         reference="--reference=$2"
>> @@ -551,11 +566,34 @@ cmd_update()
>>                 shift
>>         done
>>
>> +       # Correctly handle the case where '-q' came before 'update' on the command line.
>> +       if test -n "$GIT_QUIET"
>> +       then
>> +               orig_flags="$orig_flags -q"
>> +       fi
>> +
>>         if test -n "$init"
>>         then
>>                 cmd_init "--" "$@" || return
>>         fi
>>
>> +       if test "$jobs" != 1
>> +       then
>> +               if ( echo test | xargs -P "$jobs" true 2>/dev/null )
>> +               then
>> +                       if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
>> +                               max_lines="--max-lines=1"
>> +                       else
>> +                               max_lines="-L 1"
>> +                       fi
>> +                       module_list "$@" | awk '{print $4}' |
>> +                       xargs $max_lines -P "$jobs" git submodule update $orig_flags
>> +                       return
>> +               else
>> +                       echo "Warn: parallel execution is not supported on this platform."
>> +               fi
>> +       fi
>> +
>>         cloned_modules=
>>         module_list "$@" | {
>>         err=
>> --
>> 1.7.7.3
>>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-10-30 18:11 ` Stefan Zager
@ 2012-11-02 21:49   ` Stefan Zager
  2012-11-03 15:42   ` Jens Lehmann
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Zager @ 2012-11-02 21:49 UTC (permalink / raw)
  To: git; +Cc: Jens Lehmann, Heiko Voigt, Junio C Hamano

ping?

On Tue, Oct 30, 2012 at 11:11 AM, Stefan Zager <szager@google.com> wrote:
> This is a refresh of a conversation from a couple of months ago.
>
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.
>
> Stefan
>
> On Tue, Oct 30, 2012 at 11:03 AM,  <szager@google.com> wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>>
>> Signed-off-by: Stefan Zager <szager@google.com>
>> ---
>>  Documentation/git-submodule.txt |    8 ++++++-
>>  git-submodule.sh                |   40 ++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
>>  'git submodule' [--quiet] init [--] [<path>...]
>>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> -             [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> +             [--reference <repository>] [--merge] [--recursive]
>> +             [-j|--jobs [jobs]] [--] [<path>...]
>>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
>>               [commit] [--] [<path>...]
>>  'git submodule' [--quiet] foreach [--recursive] <command>
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
>>  setting as stored in .gitmodules, you can automatically initialize the
>>  submodule with the `--init` option.
>>  +
>> +By default, each submodule is treated serially.  You may specify a degree of
>> +parallel execution with the --jobs flag.  If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all jobs are
>> +run in parallel.
>> ++
>>  If `--recursive` is specified, this command will recurse into the
>>  registered submodules, and update any nested submodules within.
>>  +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
>>     or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
>>     or: $dashless [--quiet] init [--] [<path>...]
>> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
>>     or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
>>     or: $dashless [--quiet] foreach [--recursive] <command>
>>     or: $dashless [--quiet] sync [--] [<path>...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>>  {
>>         # parse $args after "submodule ... update".
>>         orig_flags=
>> +       jobs="1"
>>         while test $# -ne 0
>>         do
>>                 case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>>                 -r|--rebase)
>>                         update="rebase"
>>                         ;;
>> +               -j|--jobs)
>> +                       case "$2" in
>> +                       ''|-*)
>> +                               jobs="0"
>> +                               ;;
>> +                       *)
>> +                               jobs="$2"
>> +                               shift
>> +                               ;;
>> +                       esac
>> +                       # Don't preserve this arg.
>> +                       shift
>> +                       continue
>> +                       ;;
>>                 --reference)
>>                         case "$2" in '') usage ;; esac
>>                         reference="--reference=$2"
>> @@ -551,11 +566,34 @@ cmd_update()
>>                 shift
>>         done
>>
>> +       # Correctly handle the case where '-q' came before 'update' on the command line.
>> +       if test -n "$GIT_QUIET"
>> +       then
>> +               orig_flags="$orig_flags -q"
>> +       fi
>> +
>>         if test -n "$init"
>>         then
>>                 cmd_init "--" "$@" || return
>>         fi
>>
>> +       if test "$jobs" != 1
>> +       then
>> +               if ( echo test | xargs -P "$jobs" true 2>/dev/null )
>> +               then
>> +                       if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
>> +                               max_lines="--max-lines=1"
>> +                       else
>> +                               max_lines="-L 1"
>> +                       fi
>> +                       module_list "$@" | awk '{print $4}' |
>> +                       xargs $max_lines -P "$jobs" git submodule update $orig_flags
>> +                       return
>> +               else
>> +                       echo "Warn: parallel execution is not supported on this platform."
>> +               fi
>> +       fi
>> +
>>         cloned_modules=
>>         module_list "$@" | {
>>         err=
>> --
>> 1.7.7.3
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Enable parallelism in git submodule update.
  2012-10-30 18:03 szager
@ 2012-10-30 18:11 ` Stefan Zager
  2012-11-02 21:49   ` Stefan Zager
  2012-11-03 15:42   ` Jens Lehmann
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Zager @ 2012-10-30 18:11 UTC (permalink / raw)
  To: git; +Cc: Jens Lehmann, Heiko Voigt, Junio C Hamano

This is a refresh of a conversation from a couple of months ago.

I didn't try to implement all the desired features (e.g., smart logic
for passing a -j parameter to recursive submodule invocations), but I
did address the one issue that Junio insisted on: the code makes a
best effort to detect whether xargs supports parallel execution on the
host platform, and if it doesn't, then it prints a warning and falls
back to serial execution.

Stefan

On Tue, Oct 30, 2012 at 11:03 AM,  <szager@google.com> wrote:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.
>
> Signed-off-by: Stefan Zager <szager@google.com>
> ---
>  Documentation/git-submodule.txt |    8 ++++++-
>  git-submodule.sh                |   40 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index b4683bb..cb23ba7 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -14,7 +14,8 @@ SYNOPSIS
>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
>  'git submodule' [--quiet] init [--] [<path>...]
>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
> -             [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
> +             [--reference <repository>] [--merge] [--recursive]
> +             [-j|--jobs [jobs]] [--] [<path>...]
>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
>               [commit] [--] [<path>...]
>  'git submodule' [--quiet] foreach [--recursive] <command>
> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
>  setting as stored in .gitmodules, you can automatically initialize the
>  submodule with the `--init` option.
>  +
> +By default, each submodule is treated serially.  You may specify a degree of
> +parallel execution with the --jobs flag.  If a parameter is provided, it is
> +the maximum number of jobs to run in parallel; without a parameter, all jobs are
> +run in parallel.
> ++
>  If `--recursive` is specified, this command will recurse into the
>  registered submodules, and update any nested submodules within.
>  +
> diff --git a/git-submodule.sh b/git-submodule.sh
> index ab6b110..60a5f96 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
>     or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
>     or: $dashless [--quiet] init [--] [<path>...]
> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
>     or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
>     or: $dashless [--quiet] foreach [--recursive] <command>
>     or: $dashless [--quiet] sync [--] [<path>...]"
> @@ -500,6 +500,7 @@ cmd_update()
>  {
>         # parse $args after "submodule ... update".
>         orig_flags=
> +       jobs="1"
>         while test $# -ne 0
>         do
>                 case "$1" in
> @@ -518,6 +519,20 @@ cmd_update()
>                 -r|--rebase)
>                         update="rebase"
>                         ;;
> +               -j|--jobs)
> +                       case "$2" in
> +                       ''|-*)
> +                               jobs="0"
> +                               ;;
> +                       *)
> +                               jobs="$2"
> +                               shift
> +                               ;;
> +                       esac
> +                       # Don't preserve this arg.
> +                       shift
> +                       continue
> +                       ;;
>                 --reference)
>                         case "$2" in '') usage ;; esac
>                         reference="--reference=$2"
> @@ -551,11 +566,34 @@ cmd_update()
>                 shift
>         done
>
> +       # Correctly handle the case where '-q' came before 'update' on the command line.
> +       if test -n "$GIT_QUIET"
> +       then
> +               orig_flags="$orig_flags -q"
> +       fi
> +
>         if test -n "$init"
>         then
>                 cmd_init "--" "$@" || return
>         fi
>
> +       if test "$jobs" != 1
> +       then
> +               if ( echo test | xargs -P "$jobs" true 2>/dev/null )
> +               then
> +                       if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
> +                               max_lines="--max-lines=1"
> +                       else
> +                               max_lines="-L 1"
> +                       fi
> +                       module_list "$@" | awk '{print $4}' |
> +                       xargs $max_lines -P "$jobs" git submodule update $orig_flags
> +                       return
> +               else
> +                       echo "Warn: parallel execution is not supported on this platform."
> +               fi
> +       fi
> +
>         cloned_modules=
>         module_list "$@" | {
>         err=
> --
> 1.7.7.3
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] Enable parallelism in git submodule update.
@ 2012-10-30 18:03 szager
  2012-10-30 18:11 ` Stefan Zager
  0 siblings, 1 reply; 19+ messages in thread
From: szager @ 2012-10-30 18:03 UTC (permalink / raw)
  To: git; +Cc: jens.lehmann, hvoigt, gitster

The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager <szager@google.com>
---
 Documentation/git-submodule.txt |    8 ++++++-
 git-submodule.sh                |   40 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index b4683bb..cb23ba7 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
 'git submodule' [--quiet] init [--] [<path>...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
-	      [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+	      [--reference <repository>] [--merge] [--recursive]
+	      [-j|--jobs [jobs]] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 +
diff --git a/git-submodule.sh b/git-submodule.sh
index ab6b110..60a5f96 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
    or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
    or: $dashless [--quiet] init [--] [<path>...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
    or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
    or: $dashless [--quiet] foreach [--recursive] <command>
    or: $dashless [--quiet] sync [--] [<path>...]"
@@ -500,6 +500,7 @@ cmd_update()
 {
 	# parse $args after "submodule ... update".
 	orig_flags=
+	jobs="1"
 	while test $# -ne 0
 	do
 		case "$1" in
@@ -518,6 +519,20 @@ cmd_update()
 		-r|--rebase)
 			update="rebase"
 			;;
+		-j|--jobs)
+			case "$2" in
+			''|-*)
+				jobs="0"
+				;;
+			*)
+				jobs="$2"
+				shift
+				;;
+			esac
+			# Don't preserve this arg.
+			shift
+			continue
+			;;
 		--reference)
 			case "$2" in '') usage ;; esac
 			reference="--reference=$2"
@@ -551,11 +566,34 @@ cmd_update()
 		shift
 	done
 
+	# Correctly handle the case where '-q' came before 'update' on the command line.
+	if test -n "$GIT_QUIET"
+	then
+		orig_flags="$orig_flags -q"
+	fi
+
 	if test -n "$init"
 	then
 		cmd_init "--" "$@" || return
 	fi
 
+	if test "$jobs" != 1
+	then
+		if ( echo test | xargs -P "$jobs" true 2>/dev/null )
+		then
+			if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
+				max_lines="--max-lines=1"
+			else
+				max_lines="-L 1"
+			fi
+			module_list "$@" | awk '{print $4}' |
+			xargs $max_lines -P "$jobs" git submodule update $orig_flags
+			return
+		else
+			echo "Warn: parallel execution is not supported on this platform."
+		fi
+	fi
+
 	cloned_modules=
 	module_list "$@" | {
 	err=
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH] Enable parallelism in git submodule update.
@ 2012-10-30 18:03 szager
  0 siblings, 0 replies; 19+ messages in thread
From: szager @ 2012-10-30 18:03 UTC (permalink / raw)
  To: git, jens.lehmann, -cc, hvoigt, -cc, gitster; +Cc: c

The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager <szager@google.com>
---
 Documentation/git-submodule.txt |    8 ++++++-
 git-submodule.sh                |   40 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index b4683bb..cb23ba7 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
 'git submodule' [--quiet] init [--] [<path>...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
-	      [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+	      [--reference <repository>] [--merge] [--recursive]
+	      [-j|--jobs [jobs]] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 +
diff --git a/git-submodule.sh b/git-submodule.sh
index ab6b110..60a5f96 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
    or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
    or: $dashless [--quiet] init [--] [<path>...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
    or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
    or: $dashless [--quiet] foreach [--recursive] <command>
    or: $dashless [--quiet] sync [--] [<path>...]"
@@ -500,6 +500,7 @@ cmd_update()
 {
 	# parse $args after "submodule ... update".
 	orig_flags=
+	jobs="1"
 	while test $# -ne 0
 	do
 		case "$1" in
@@ -518,6 +519,20 @@ cmd_update()
 		-r|--rebase)
 			update="rebase"
 			;;
+		-j|--jobs)
+			case "$2" in
+			''|-*)
+				jobs="0"
+				;;
+			*)
+				jobs="$2"
+				shift
+				;;
+			esac
+			# Don't preserve this arg.
+			shift
+			continue
+			;;
 		--reference)
 			case "$2" in '') usage ;; esac
 			reference="--reference=$2"
@@ -551,11 +566,34 @@ cmd_update()
 		shift
 	done
 
+	# Correctly handle the case where '-q' came before 'update' on the command line.
+	if test -n "$GIT_QUIET"
+	then
+		orig_flags="$orig_flags -q"
+	fi
+
 	if test -n "$init"
 	then
 		cmd_init "--" "$@" || return
 	fi
 
+	if test "$jobs" != 1
+	then
+		if ( echo test | xargs -P "$jobs" true 2>/dev/null )
+		then
+			if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
+				max_lines="--max-lines=1"
+			else
+				max_lines="-L 1"
+			fi
+			module_list "$@" | awk '{print $4}' |
+			xargs $max_lines -P "$jobs" git submodule update $orig_flags
+			return
+		else
+			echo "Warn: parallel execution is not supported on this platform."
+		fi
+	fi
+
 	cloned_modules=
 	module_list "$@" | {
 	err=
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-11-03 19:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-27 18:37 [PATCH] Enable parallelism in git submodule update Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
     [not found]   ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
2012-07-27 23:25     ` Junio C Hamano
2012-07-28 10:52       ` Heiko Voigt
2012-07-29 21:59         ` Junio C Hamano
2012-07-28 10:22 ` Heiko Voigt
2012-07-28 12:19   ` [PATCH] cleanup argument passing in submodule status command Heiko Voigt
2012-07-29  6:22     ` Junio C Hamano
2012-07-29 15:29       ` Jens Lehmann
2012-07-29 21:57         ` Junio C Hamano
2012-07-29 15:37 ` [PATCH] Enable parallelism in git submodule update Jens Lehmann
2012-11-03 19:07   ` Jens Lehmann
2012-10-30 18:03 szager
2012-10-30 18:03 szager
2012-10-30 18:11 ` Stefan Zager
2012-11-02 21:49   ` Stefan Zager
2012-11-03 15:42   ` Jens Lehmann
2012-11-03 18:44     ` Phil Hord
2012-11-03 19:13       ` Jens Lehmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.