* [PATCH] Enable parallelism in git submodule update.
@ 2012-10-30 18:03 szager
2012-10-30 18:11 ` Stefan Zager
0 siblings, 1 reply; 15+ messages in thread
From: szager @ 2012-10-30 18:03 UTC (permalink / raw)
To: git; +Cc: jens.lehmann, hvoigt, gitster
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.
Signed-off-by: Stefan Zager <szager@google.com>
---
Documentation/git-submodule.txt | 8 ++++++-
git-submodule.sh | 40 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index b4683bb..cb23ba7 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
'git submodule' [--quiet] init [--] [<path>...]
'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ [--reference <repository>] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [<path>...]
'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
[commit] [--] [<path>...]
'git submodule' [--quiet] foreach [--recursive] <command>
@@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
setting as stored in .gitmodules, you can automatically initialize the
submodule with the `--init` option.
+
+By default, each submodule is treated serially. You may specify a degree of
+parallel execution with the --jobs flag. If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
If `--recursive` is specified, this command will recurse into the
registered submodules, and update any nested submodules within.
+
diff --git a/git-submodule.sh b/git-submodule.sh
index ab6b110..60a5f96 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
or: $dashless [--quiet] init [--] [<path>...]
- or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
or: $dashless [--quiet] foreach [--recursive] <command>
or: $dashless [--quiet] sync [--] [<path>...]"
@@ -500,6 +500,7 @@ cmd_update()
{
# parse $args after "submodule ... update".
orig_flags=
+ jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -518,6 +519,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+ -j|--jobs)
+ case "$2" in
+ ''|-*)
+ jobs="0"
+ ;;
+ *)
+ jobs="$2"
+ shift
+ ;;
+ esac
+ # Don't preserve this arg.
+ shift
+ continue
+ ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -551,11 +566,34 @@ cmd_update()
shift
done
+ # Correctly handle the case where '-q' came before 'update' on the command line.
+ if test -n "$GIT_QUIET"
+ then
+ orig_flags="$orig_flags -q"
+ fi
+
if test -n "$init"
then
cmd_init "--" "$@" || return
fi
+ if test "$jobs" != 1
+ then
+ if ( echo test | xargs -P "$jobs" true 2>/dev/null )
+ then
+ if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
+ max_lines="--max-lines=1"
+ else
+ max_lines="-L 1"
+ fi
+ module_list "$@" | awk '{print $4}' |
+ xargs $max_lines -P "$jobs" git submodule update $orig_flags
+ return
+ else
+ echo "Warn: parallel execution is not supported on this platform."
+ fi
+ fi
+
cloned_modules=
module_list "$@" | {
err=
--
1.7.7.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-10-30 18:03 [PATCH] Enable parallelism in git submodule update szager
@ 2012-10-30 18:11 ` Stefan Zager
2012-11-02 21:49 ` Stefan Zager
2012-11-03 15:42 ` Jens Lehmann
0 siblings, 2 replies; 15+ messages in thread
From: Stefan Zager @ 2012-10-30 18:11 UTC (permalink / raw)
To: git; +Cc: Jens Lehmann, Heiko Voigt, Junio C Hamano
This is a refresh of a conversation from a couple of months ago.
I didn't try to implement all the desired features (e.g., smart logic
for passing a -j parameter to recursive submodule invocations), but I
did address the one issue that Junio insisted on: the code makes a
best effort to detect whether xargs supports parallel execution on the
host platform, and if it doesn't, then it prints a warning and falls
back to serial execution.
Stefan
On Tue, Oct 30, 2012 at 11:03 AM, <szager@google.com> wrote:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.
>
> Signed-off-by: Stefan Zager <szager@google.com>
> ---
> Documentation/git-submodule.txt | 8 ++++++-
> git-submodule.sh | 40 ++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index b4683bb..cb23ba7 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -14,7 +14,8 @@ SYNOPSIS
> 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
> 'git submodule' [--quiet] init [--] [<path>...]
> 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
> - [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
> + [--reference <repository>] [--merge] [--recursive]
> + [-j|--jobs [jobs]] [--] [<path>...]
> 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
> [commit] [--] [<path>...]
> 'git submodule' [--quiet] foreach [--recursive] <command>
> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
> setting as stored in .gitmodules, you can automatically initialize the
> submodule with the `--init` option.
> +
> +By default, each submodule is treated serially. You may specify a degree of
> +parallel execution with the --jobs flag. If a parameter is provided, it is
> +the maximum number of jobs to run in parallel; without a parameter, all jobs are
> +run in parallel.
> ++
> If `--recursive` is specified, this command will recurse into the
> registered submodules, and update any nested submodules within.
> +
> diff --git a/git-submodule.sh b/git-submodule.sh
> index ab6b110..60a5f96 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
> USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
> or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
> or: $dashless [--quiet] init [--] [<path>...]
> - or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
> + or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
> or: $dashless [--quiet] foreach [--recursive] <command>
> or: $dashless [--quiet] sync [--] [<path>...]"
> @@ -500,6 +500,7 @@ cmd_update()
> {
> # parse $args after "submodule ... update".
> orig_flags=
> + jobs="1"
> while test $# -ne 0
> do
> case "$1" in
> @@ -518,6 +519,20 @@ cmd_update()
> -r|--rebase)
> update="rebase"
> ;;
> + -j|--jobs)
> + case "$2" in
> + ''|-*)
> + jobs="0"
> + ;;
> + *)
> + jobs="$2"
> + shift
> + ;;
> + esac
> + # Don't preserve this arg.
> + shift
> + continue
> + ;;
> --reference)
> case "$2" in '') usage ;; esac
> reference="--reference=$2"
> @@ -551,11 +566,34 @@ cmd_update()
> shift
> done
>
> + # Correctly handle the case where '-q' came before 'update' on the command line.
> + if test -n "$GIT_QUIET"
> + then
> + orig_flags="$orig_flags -q"
> + fi
> +
> if test -n "$init"
> then
> cmd_init "--" "$@" || return
> fi
>
> + if test "$jobs" != 1
> + then
> + if ( echo test | xargs -P "$jobs" true 2>/dev/null )
> + then
> + if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
> + max_lines="--max-lines=1"
> + else
> + max_lines="-L 1"
> + fi
> + module_list "$@" | awk '{print $4}' |
> + xargs $max_lines -P "$jobs" git submodule update $orig_flags
> + return
> + else
> + echo "Warn: parallel execution is not supported on this platform."
> + fi
> + fi
> +
> cloned_modules=
> module_list "$@" | {
> err=
> --
> 1.7.7.3
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-10-30 18:11 ` Stefan Zager
@ 2012-11-02 21:49 ` Stefan Zager
2012-11-03 15:42 ` Jens Lehmann
1 sibling, 0 replies; 15+ messages in thread
From: Stefan Zager @ 2012-11-02 21:49 UTC (permalink / raw)
To: git; +Cc: Jens Lehmann, Heiko Voigt, Junio C Hamano
ping?
On Tue, Oct 30, 2012 at 11:11 AM, Stefan Zager <szager@google.com> wrote:
> This is a refresh of a conversation from a couple of months ago.
>
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.
>
> Stefan
>
> On Tue, Oct 30, 2012 at 11:03 AM, <szager@google.com> wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>>
>> Signed-off-by: Stefan Zager <szager@google.com>
>> ---
>> Documentation/git-submodule.txt | 8 ++++++-
>> git-submodule.sh | 40 ++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>> 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
>> 'git submodule' [--quiet] init [--] [<path>...]
>> 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> - [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> + [--reference <repository>] [--merge] [--recursive]
>> + [-j|--jobs [jobs]] [--] [<path>...]
>> 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
>> [commit] [--] [<path>...]
>> 'git submodule' [--quiet] foreach [--recursive] <command>
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
>> setting as stored in .gitmodules, you can automatically initialize the
>> submodule with the `--init` option.
>> +
>> +By default, each submodule is treated serially. You may specify a degree of
>> +parallel execution with the --jobs flag. If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all jobs are
>> +run in parallel.
>> ++
>> If `--recursive` is specified, this command will recurse into the
>> registered submodules, and update any nested submodules within.
>> +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>> USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
>> or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
>> or: $dashless [--quiet] init [--] [<path>...]
>> - or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> + or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
>> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
>> or: $dashless [--quiet] foreach [--recursive] <command>
>> or: $dashless [--quiet] sync [--] [<path>...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>> {
>> # parse $args after "submodule ... update".
>> orig_flags=
>> + jobs="1"
>> while test $# -ne 0
>> do
>> case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>> -r|--rebase)
>> update="rebase"
>> ;;
>> + -j|--jobs)
>> + case "$2" in
>> + ''|-*)
>> + jobs="0"
>> + ;;
>> + *)
>> + jobs="$2"
>> + shift
>> + ;;
>> + esac
>> + # Don't preserve this arg.
>> + shift
>> + continue
>> + ;;
>> --reference)
>> case "$2" in '') usage ;; esac
>> reference="--reference=$2"
>> @@ -551,11 +566,34 @@ cmd_update()
>> shift
>> done
>>
>> + # Correctly handle the case where '-q' came before 'update' on the command line.
>> + if test -n "$GIT_QUIET"
>> + then
>> + orig_flags="$orig_flags -q"
>> + fi
>> +
>> if test -n "$init"
>> then
>> cmd_init "--" "$@" || return
>> fi
>>
>> + if test "$jobs" != 1
>> + then
>> + if ( echo test | xargs -P "$jobs" true 2>/dev/null )
>> + then
>> + if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
>> + max_lines="--max-lines=1"
>> + else
>> + max_lines="-L 1"
>> + fi
>> + module_list "$@" | awk '{print $4}' |
>> + xargs $max_lines -P "$jobs" git submodule update $orig_flags
>> + return
>> + else
>> + echo "Warn: parallel execution is not supported on this platform."
>> + fi
>> + fi
>> +
>> cloned_modules=
>> module_list "$@" | {
>> err=
>> --
>> 1.7.7.3
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-10-30 18:11 ` Stefan Zager
2012-11-02 21:49 ` Stefan Zager
@ 2012-11-03 15:42 ` Jens Lehmann
2012-11-03 18:44 ` Phil Hord
1 sibling, 1 reply; 15+ messages in thread
From: Jens Lehmann @ 2012-11-03 15:42 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, Heiko Voigt, Junio C Hamano
Am 30.10.2012 19:11, schrieb Stefan Zager:
> This is a refresh of a conversation from a couple of months ago.
>
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.
I suspect not passing on --jobs recursively like you do here is the
right thing to do, as that would give exponential growth of jobs with
recursion depth, which makes no sense to me.
A still unsolved issue is the unstructured output from the different
update jobs. It'll be hard (if not impossible) to see in what submodule
which update took place (or failed). I think we should have a solution
for that too (maybe one of those Heiko mentioned or something as simple
as implying "-q"?).
> Stefan
>
> On Tue, Oct 30, 2012 at 11:03 AM, <szager@google.com> wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>
>> Signed-off-by: Stefan Zager <szager@google.com>
>> ---
>> Documentation/git-submodule.txt | 8 ++++++-
>> git-submodule.sh | 40 ++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>> 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
>> 'git submodule' [--quiet] init [--] [<path>...]
>> 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> - [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> + [--reference <repository>] [--merge] [--recursive]
>> + [-j|--jobs [jobs]] [--] [<path>...]
>> 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
>> [commit] [--] [<path>...]
>> 'git submodule' [--quiet] foreach [--recursive] <command>
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
>> setting as stored in .gitmodules, you can automatically initialize the
>> submodule with the `--init` option.
>> +
>> +By default, each submodule is treated serially. You may specify a degree of
>> +parallel execution with the --jobs flag. If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all jobs are
>> +run in parallel.
>> ++
The new "--jobs" option should be documented under "OPTIONS", (and maybe
include that "--jobs 0" does the same as "--jobs" alone and that this is
not supported on all platforms).
>> If `--recursive` is specified, this command will recurse into the
>> registered submodules, and update any nested submodules within.
>> +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>> USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
>> or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
>> or: $dashless [--quiet] init [--] [<path>...]
>> - or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
>> + or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
>> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
>> or: $dashless [--quiet] foreach [--recursive] <command>
>> or: $dashless [--quiet] sync [--] [<path>...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>> {
>> # parse $args after "submodule ... update".
>> orig_flags=
>> + jobs="1"
>> while test $# -ne 0
>> do
>> case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>> -r|--rebase)
>> update="rebase"
>> ;;
>> + -j|--jobs)
>> + case "$2" in
>> + ''|-*)
>> + jobs="0"
>> + ;;
>> + *)
>> + jobs="$2"
>> + shift
>> + ;;
>> + esac
>> + # Don't preserve this arg.
>> + shift
>> + continue
>> + ;;
>> --reference)
>> case "$2" in '') usage ;; esac
>> reference="--reference=$2"
>> @@ -551,11 +566,34 @@ cmd_update()
>> shift
>> done
>>
>> + # Correctly handle the case where '-q' came before 'update' on the command line.
>> + if test -n "$GIT_QUIET"
>> + then
>> + orig_flags="$orig_flags -q"
>> + fi
>> +
>> if test -n "$init"
>> then
>> cmd_init "--" "$@" || return
>> fi
>>
>> + if test "$jobs" != 1
>> + then
>> + if ( echo test | xargs -P "$jobs" true 2>/dev/null )
>> + then
>> + if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
>> + max_lines="--max-lines=1"
>> + else
>> + max_lines="-L 1"
>> + fi
>> + module_list "$@" | awk '{print $4}' |
>> + xargs $max_lines -P "$jobs" git submodule update $orig_flags
>> + return
>> + else
>> + echo "Warn: parallel execution is not supported on this platform."
>> + fi
>> + fi
>> +
>> cloned_modules=
>> module_list "$@" | {
>> err=
>> --
>> 1.7.7.3
>>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-11-03 15:42 ` Jens Lehmann
@ 2012-11-03 18:44 ` Phil Hord
2012-11-03 19:13 ` Jens Lehmann
0 siblings, 1 reply; 15+ messages in thread
From: Phil Hord @ 2012-11-03 18:44 UTC (permalink / raw)
To: Jens Lehmann; +Cc: Stefan Zager, git, Heiko Voigt, Junio C Hamano
On Sat, Nov 3, 2012 at 11:42 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 30.10.2012 19:11, schrieb Stefan Zager:
>> This is a refresh of a conversation from a couple of months ago.
>>
>> I didn't try to implement all the desired features (e.g., smart logic
>> for passing a -j parameter to recursive submodule invocations), but I
>> did address the one issue that Junio insisted on: the code makes a
>> best effort to detect whether xargs supports parallel execution on the
>> host platform, and if it doesn't, then it prints a warning and falls
>> back to serial execution.
>
> I suspect not passing on --jobs recursively like you do here is the
> right thing to do, as that would give exponential growth of jobs with
> recursion depth, which makes no sense to me.
On the other hand, since $jobs is still defined when the recursive
call to is made to 'eval cmd_update "$orig_flags"', I suspect the
value *is* passed down recursively. Maybe $jobs should be manually
reset before recursing -- unless it is "0" -- though I expect someone
would feel differently if she had one submodule on level 1 and 10
submodules on level 2. She would be surprised, then, when --jobs=10
seemed to have little affect on performance. So maybe it is best to
leave it as it is, excepting that the apparent attempt not to pass the
switch down is probably misleading.
Phil
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-11-03 18:44 ` Phil Hord
@ 2012-11-03 19:13 ` Jens Lehmann
0 siblings, 0 replies; 15+ messages in thread
From: Jens Lehmann @ 2012-11-03 19:13 UTC (permalink / raw)
To: Phil Hord; +Cc: Stefan Zager, git, Heiko Voigt, Junio C Hamano
Am 03.11.2012 19:44, schrieb Phil Hord:
> On Sat, Nov 3, 2012 at 11:42 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 30.10.2012 19:11, schrieb Stefan Zager:
>>> This is a refresh of a conversation from a couple of months ago.
>>>
>>> I didn't try to implement all the desired features (e.g., smart logic
>>> for passing a -j parameter to recursive submodule invocations), but I
>>> did address the one issue that Junio insisted on: the code makes a
>>> best effort to detect whether xargs supports parallel execution on the
>>> host platform, and if it doesn't, then it prints a warning and falls
>>> back to serial execution.
>>
>> I suspect not passing on --jobs recursively like you do here is the
>> right thing to do, as that would give exponential growth of jobs with
>> recursion depth, which makes no sense to me.
>
> On the other hand, since $jobs is still defined when the recursive
> call to is made to 'eval cmd_update "$orig_flags"', I suspect the
> value *is* passed down recursively.
But for $jobs != 1 Stefan's code doesn't use eval cmd_update but
starts the submodule script again:
+ xargs $max_lines -P "$jobs" git submodule update $orig_flags
That should get rid of the $jobs setting, or am I missing something?
> Maybe $jobs should be manually
> reset before recursing -- unless it is "0" -- though I expect someone
> would feel differently if she had one submodule on level 1 and 10
> submodules on level 2. She would be surprised, then, when --jobs=10
> seemed to have little affect on performance.
Hmm, good point. However we implement that, it should at least be
properly documented in the man page (and in the use case you describe
a "git submodule foreach 'git submodule update -j 10'" could be the
solution if we choose to not propagate the jobs option).
> So maybe it is best to
> leave it as it is, excepting that the apparent attempt not to pass the
> switch down is probably misleading.
I didn't test it, but I think it should work (famous last words ;-).
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] Enable parallelism in git submodule update.
@ 2012-10-30 18:03 szager
0 siblings, 0 replies; 15+ messages in thread
From: szager @ 2012-10-30 18:03 UTC (permalink / raw)
To: git, jens.lehmann, -cc, hvoigt, -cc, gitster; +Cc: c
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.
Signed-off-by: Stefan Zager <szager@google.com>
---
Documentation/git-submodule.txt | 8 ++++++-
git-submodule.sh | 40 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index b4683bb..cb23ba7 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
'git submodule' [--quiet] init [--] [<path>...]
'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ [--reference <repository>] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [<path>...]
'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
[commit] [--] [<path>...]
'git submodule' [--quiet] foreach [--recursive] <command>
@@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want to use the
setting as stored in .gitmodules, you can automatically initialize the
submodule with the `--init` option.
+
+By default, each submodule is treated serially. You may specify a degree of
+parallel execution with the --jobs flag. If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
If `--recursive` is specified, this command will recurse into the
registered submodules, and update any nested submodules within.
+
diff --git a/git-submodule.sh b/git-submodule.sh
index ab6b110..60a5f96 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
or: $dashless [--quiet] init [--] [<path>...]
- or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
or: $dashless [--quiet] foreach [--recursive] <command>
or: $dashless [--quiet] sync [--] [<path>...]"
@@ -500,6 +500,7 @@ cmd_update()
{
# parse $args after "submodule ... update".
orig_flags=
+ jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -518,6 +519,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+ -j|--jobs)
+ case "$2" in
+ ''|-*)
+ jobs="0"
+ ;;
+ *)
+ jobs="$2"
+ shift
+ ;;
+ esac
+ # Don't preserve this arg.
+ shift
+ continue
+ ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -551,11 +566,34 @@ cmd_update()
shift
done
+ # Correctly handle the case where '-q' came before 'update' on the command line.
+ if test -n "$GIT_QUIET"
+ then
+ orig_flags="$orig_flags -q"
+ fi
+
if test -n "$init"
then
cmd_init "--" "$@" || return
fi
+ if test "$jobs" != 1
+ then
+ if ( echo test | xargs -P "$jobs" true 2>/dev/null )
+ then
+ if ( echo test | xargs --max-lines=1 true 2>/dev/null ); then
+ max_lines="--max-lines=1"
+ else
+ max_lines="-L 1"
+ fi
+ module_list "$@" | awk '{print $4}' |
+ xargs $max_lines -P "$jobs" git submodule update $orig_flags
+ return
+ else
+ echo "Warn: parallel execution is not supported on this platform."
+ fi
+ fi
+
cloned_modules=
module_list "$@" | {
err=
--
1.7.7.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] Enable parallelism in git submodule update.
@ 2012-07-27 18:37 Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Stefan Zager @ 2012-07-27 18:37 UTC (permalink / raw)
To: git; +Cc: gitster, jens.lehmann, hvoigt
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.
Signed-off-by: Stefan Zager <szager@google.com>
---
Documentation/git-submodule.txt | 8 +++++++-
git-submodule.sh | 23 ++++++++++++++++++++++-
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index fbbbcb2..34f81fb 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
'git submodule' [--quiet] init [--] [<path>...]
'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ [--reference <repository>] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [<path>...]
'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
[commit] [--] [<path>...]
'git submodule' [--quiet] foreach [--recursive] <command>
@@ -147,6 +148,11 @@ If the submodule is not yet initialized, and you just want to use the
setting as stored in .gitmodules, you can automatically initialize the
submodule with the `--init` option.
+
+By default, each submodule is treated serially. You may specify a degree of
+parallel execution with the --jobs flag. If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs are
+run in parallel.
++
If `--recursive` is specified, this command will recurse into the
registered submodules, and update any nested submodules within.
diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..761420a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
USAGE="[--quiet] add [-b branch] [-f|--force] [--reference <repository>] [--] <repository> [<path>]
or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
or: $dashless [--quiet] init [--] [<path>...]
- or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [--] [<path>...]
+ or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] [--rebase] [--reference <repository>] [--merge] [--recursive] [-j|--jobs [jobs]] [--] [<path>...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
or: $dashless [--quiet] foreach [--recursive] <command>
or: $dashless [--quiet] sync [--] [<path>...]"
@@ -473,6 +473,7 @@ cmd_update()
{
# parse $args after "submodule ... update".
orig_flags=
+ jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -491,6 +492,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+ -j|--jobs)
+ case "$2" in
+ ''|-*)
+ jobs="0"
+ ;;
+ *)
+ jobs="$2"
+ shift
+ ;;
+ esac
+ # Don't preserve this arg.
+ shift
+ continue
+ ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -529,6 +544,12 @@ cmd_update()
cmd_init "--" "$@" || return
fi
+ if test "$jobs" != "1"
+ then
+ module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
+ return
+ fi
+
cloned_modules=
module_list "$@" | {
err=
--
1.7.11.rc2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-27 18:37 Stefan Zager
@ 2012-07-27 21:38 ` Junio C Hamano
[not found] ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
2012-07-28 10:22 ` Heiko Voigt
2012-07-29 15:37 ` Jens Lehmann
2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2012-07-27 21:38 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, jens.lehmann, hvoigt
Stefan Zager <szager@google.com> writes:
> + module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
Capital-P option to xargs is not even in POSIX, no?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-27 18:37 Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
@ 2012-07-28 10:22 ` Heiko Voigt
2012-07-29 15:37 ` Jens Lehmann
2 siblings, 0 replies; 15+ messages in thread
From: Heiko Voigt @ 2012-07-28 10:22 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, gitster, jens.lehmann
Hi Stefan,
neat patch. See below for a few notes.
On Fri, Jul 27, 2012 at 11:37:34AM -0700, Stefan Zager wrote:
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dba4d39..761420a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -491,6 +492,20 @@ cmd_update()
> -r|--rebase)
> update="rebase"
> ;;
> + -j|--jobs)
> + case "$2" in
> + ''|-*)
> + jobs="0"
> + ;;
> + *)
> + jobs="$2"
> + shift
> + ;;
> + esac
> + # Don't preserve this arg.
> + shift
> + continue
> + ;;
> --reference)
> case "$2" in '') usage ;; esac
> reference="--reference=$2"
> @@ -529,6 +544,12 @@ cmd_update()
> cmd_init "--" "$@" || return
> fi
>
> + if test "$jobs" != "1"
> + then
> + module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git submodule update $orig_args
I do not see orig_args set anywhere in submodule.sh. It seems the
existing usage of it in cmd_status() is a leftover from commit
98dbe63 when this variable got renamed to orig_flags.
I will follow up with a patch to that location.
Another problem here is the passing of arguments. Have a look at
a7eff1a8 to see how this was solved for other locations.
The next thing I noticed is that the parallelism is not recursive. You
drop the option and only execute the first depth in parallel. How about
using the amount of modules defined by arguments left in $@ as an
indicator whether you need to fork parallel execution or not. If there
is exactly one you do the update if there are more you do the parallel
thing. That way you can just keep passing the --jobs flag to the
subprocesses.
The next question to solve is UI: Since the output lines of the parallel
update jobs will be mixed we need some way to distinguish them. Imagine
one of the update fails somewhere how do we find out which it was?
Two possible solutions come to my mind:
1. Prefix each line with a job number. This way you can distinguish
which process outputted what and still have immediate feedback.
2. Cache the output (to stderr and stdout) of each job and output it
once one job is done. I imagine this needs some infrastructure which
we need to implement. We already have some ideas how to collect such
output in C here[1].
I would prefer solution 2 since the output of 1 will be hard to read but
I guess we could start with 1 and then move over to 2 later on.
Cheers Heiko
[1] http://article.gmane.org/gmane.comp.version-control.git/197747
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-27 18:37 Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
2012-07-28 10:22 ` Heiko Voigt
@ 2012-07-29 15:37 ` Jens Lehmann
2012-11-03 19:07 ` Jens Lehmann
2 siblings, 1 reply; 15+ messages in thread
From: Jens Lehmann @ 2012-07-29 15:37 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, gitster, hvoigt
Am 27.07.2012 20:37, schrieb Stefan Zager:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.
I think this is a sound idea, but it would be good to see some
actual measurements. What are the performance numbers with and
without this change? Which cases do benefit and are there some
which run slower when run in parallel?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Enable parallelism in git submodule update.
2012-07-29 15:37 ` Jens Lehmann
@ 2012-11-03 19:07 ` Jens Lehmann
0 siblings, 0 replies; 15+ messages in thread
From: Jens Lehmann @ 2012-11-03 19:07 UTC (permalink / raw)
To: Stefan Zager; +Cc: git, gitster, hvoigt
Am 29.07.2012 17:37, schrieb Jens Lehmann:
> Am 27.07.2012 20:37, schrieb Stefan Zager:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>
> I think this is a sound idea, but it would be good to see some
> actual measurements. What are the performance numbers with and
> without this change? Which cases do benefit and are there some
> which run slower when run in parallel?
ping?
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-11-03 19:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-30 18:03 [PATCH] Enable parallelism in git submodule update szager
2012-10-30 18:11 ` Stefan Zager
2012-11-02 21:49 ` Stefan Zager
2012-11-03 15:42 ` Jens Lehmann
2012-11-03 18:44 ` Phil Hord
2012-11-03 19:13 ` Jens Lehmann
-- strict thread matches above, loose matches on Subject: below --
2012-10-30 18:03 szager
2012-07-27 18:37 Stefan Zager
2012-07-27 21:38 ` Junio C Hamano
[not found] ` <CAHOQ7J_jYAe7r1q6Cg9OJb8f+79UfS=JfRk9NrS4R4a+oLM8LA@mail.gmail.com>
2012-07-27 23:25 ` Junio C Hamano
2012-07-28 10:52 ` Heiko Voigt
2012-07-29 21:59 ` Junio C Hamano
2012-07-28 10:22 ` Heiko Voigt
2012-07-29 15:37 ` Jens Lehmann
2012-11-03 19:07 ` Jens Lehmann
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.