All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Add fetch.recurseSubmoduleParallelism config option
@ 2015-10-12 22:52 Stefan Beller
  2015-10-12 23:14 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Beller @ 2015-10-12 22:52 UTC (permalink / raw)
  To: gitster; +Cc: git, hvoigt, jens.lehmann, Stefan Beller

This allows to configure fetching in parallel without having the annoying
command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt |  6 ++++++
 builtin/fetch.c          |  2 +-
 submodule.c              | 14 ++++++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)
 
 I just monkey tested the code and it worked once! The problem with testing
 this parallelizing option is that the expected behavior doesn't change
 except for being faster. And I don't want to add timing tests to the test
 suite because they are unreliable.
 
 Any idea how to test this properly?
 
 This applies on top of sb/submodule-parallel-fetch
 
 Thanks,
 Stefan
 

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 315f271..1172db0 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1140,6 +1140,12 @@ fetch.recurseSubmodules::
 	when its superproject retrieves a commit that updates the submodule's
 	reference.
 
+fetch.recurseSubmoduleParallelism
+	This is used to determine how many submodules can be fetched in
+	parallel. Specifying a positive integer allows up to that number
+	of submodules being fetched in parallel. Specifying 0 the number
+	of cpus will be taken as the maximum number.
+
 fetch.fsckObjects::
 	If it is set to true, git-fetch-pack will check all fetched
 	objects. It will abort in the case of a malformed object or a
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f28eac6..b1399dc 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule.c b/submodule.c
index c21b265..c85d3ef 100644
--- a/submodule.c
+++ b/submodule.c
@@ -15,6 +15,7 @@
 #include "thread-utils.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
+static int config_fetch_parallel_submodules = -1;
 static struct string_list changed_submodule_paths;
 static int initialized_fetch_ref_tips;
 static struct sha1_array ref_tips_before_fetch;
@@ -179,6 +180,14 @@ int submodule_config(const char *var, const char *value, void *cb)
 	else if (!strcmp(var, "fetch.recursesubmodules")) {
 		config_fetch_recurse_submodules = parse_fetch_recurse_submodules_arg(var, value);
 		return 0;
+	} else if (!strcmp(var, "fetch.recursesubmoduleparallelism")) {
+		char *end;
+		int ret;
+		config_fetch_parallel_submodules = strtol(value, &end, 0);
+		ret = (*end == '\0');
+		if (!ret)
+			warning("value for fetch.recurseSubmoduleParallelism not recognized");
+		return ret;
 	}
 	return 0;
 }
@@ -759,6 +768,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_fetch_parallel_submodules;
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
-- 
2.5.0.267.g8d6e698.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-12 22:52 [PATCH] Add fetch.recurseSubmoduleParallelism config option Stefan Beller
@ 2015-10-12 23:14 ` Junio C Hamano
  2015-10-12 23:31   ` Stefan Beller
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2015-10-12 23:14 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, hvoigt, jens.lehmann

Stefan Beller <sbeller@google.com> writes:

> This allows to configure fetching in parallel without having the annoying
> command line option.

s/annoying//;

I think this is a sane thing to do, but the name of the variable may
want to be bikeshedded a bit.

> This moved the responsibility to determine how many parallel processes
> to start from builtin/fetch to submodule.c as we need a way to communicate
> "The user did not specify the number of parallel processes in the command
> line options" in the builtin fetch. The submodule code takes care of
> the precedence (CLI > config > default)
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/config.txt |  6 ++++++
>  builtin/fetch.c          |  2 +-
>  submodule.c              | 14 ++++++++++++++
>  3 files changed, 21 insertions(+), 1 deletion(-)
>  
>  I just monkey tested the code and it worked once! The problem with testing
>  this parallelizing option is that the expected behavior doesn't change
>  except for being faster. And I don't want to add timing tests to the test
>  suite because they are unreliable.
>
>  Any idea how to test this properly?

I agree that a test in t/ would catch bugs in the functionality.  If
your parallel implementation is somehow broken in the future and
stops functioning correctly, fetching all submodules with a single
task and fetching them with N tasks will produce different results
;-).

But it would not help you much in seeing if the parallelism is
really taking place.  Adding t/perf/ tests to show how much benefit
you are getting may be of more value.

The parallel_process API could learn a new "verbose" feature that it
by itself shows some messages like

    "processing the 'frotz' job with N tasks"
    "M tasks finished (N still running)" 

in the output stream from strategic places.  For example, the first
message will come at the end of pp_init(), and the second message
will be appended at the end of buffered output of a task that has
just been finished.  Once you have something like that, you could
check for them in a test in t/.

Just a thought.

>  
>  This applies on top of sb/submodule-parallel-fetch
>  
>  Thanks,
>  Stefan
>  
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 315f271..1172db0 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1140,6 +1140,12 @@ fetch.recurseSubmodules::
>  	when its superproject retrieves a commit that updates the submodule's
>  	reference.
>  
> +fetch.recurseSubmoduleParallelism
> +	This is used to determine how many submodules can be fetched in
> +	parallel. Specifying a positive integer allows up to that number
> +	of submodules being fetched in parallel. Specifying 0 the number
> +	of cpus will be taken as the maximum number.
> +
>  fetch.fsckObjects::
>  	If it is set to true, git-fetch-pack will check all fetched
>  	objects. It will abort in the case of a malformed object or a
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index f28eac6..b1399dc 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
>  static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
>  static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
>  static int tags = TAGS_DEFAULT, unshallow, update_shallow;
> -static int max_children = 1;
> +static int max_children = -1;
>  static const char *depth;
>  static const char *upload_pack;
>  static struct strbuf default_rla = STRBUF_INIT;
> diff --git a/submodule.c b/submodule.c
> index c21b265..c85d3ef 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -15,6 +15,7 @@
>  #include "thread-utils.h"
>  
>  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
> +static int config_fetch_parallel_submodules = -1;
>  static struct string_list changed_submodule_paths;
>  static int initialized_fetch_ref_tips;
>  static struct sha1_array ref_tips_before_fetch;
> @@ -179,6 +180,14 @@ int submodule_config(const char *var, const char *value, void *cb)
>  	else if (!strcmp(var, "fetch.recursesubmodules")) {
>  		config_fetch_recurse_submodules = parse_fetch_recurse_submodules_arg(var, value);
>  		return 0;
> +	} else if (!strcmp(var, "fetch.recursesubmoduleparallelism")) {
> +		char *end;
> +		int ret;
> +		config_fetch_parallel_submodules = strtol(value, &end, 0);
> +		ret = (*end == '\0');
> +		if (!ret)
> +			warning("value for fetch.recurseSubmoduleParallelism not recognized");
> +		return ret;
>  	}
>  	return 0;
>  }
> @@ -759,6 +768,11 @@ int fetch_populated_submodules(const struct argv_array *options,
>  	argv_array_push(&spf.args, "--recurse-submodules-default");
>  	/* default value, "--submodule-prefix" and its value are added later */
>  
> +	if (max_parallel_jobs < 0)
> +		max_parallel_jobs = config_fetch_parallel_submodules;
> +	if (max_parallel_jobs < 0)
> +		max_parallel_jobs = 1;
> +
>  	calculate_changed_submodule_paths();
>  	run_processes_parallel(max_parallel_jobs,
>  			       get_next_submodule,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-12 23:14 ` Junio C Hamano
@ 2015-10-12 23:31   ` Stefan Beller
  2015-10-12 23:50     ` Junio C Hamano
  2015-10-13  7:32     ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Beller @ 2015-10-12 23:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Heiko Voigt, Jens Lehmann

On Mon, Oct 12, 2015 at 4:14 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This allows to configure fetching in parallel without having the annoying
>> command line option.
>
> s/annoying//;
>
> I think this is a sane thing to do, but the name of the variable may
> want to be bikeshedded a bit.

That's actually what I thought about most for this patch. I expected
bikeshedding
as well in the workflow (passing around -1 for each unset option).

We should not include (threads/processes) as that is implementation detail,
and should not be exposed to the user (Looking at pack.threads)

There are some options using max_* for configuring parallel stuff
such as http.maxRequests.

There is core.preloadIndex to enable parallel index preload, but
that is boolean and not giving fine control to the user. We want to give
fine control to the user here I'd assume.

Maybe also the more fundamental question needs to be answered,
if we want to stay in the "fetch." prefix. We could also make it a
submodule specifc thing (submodule.jobs), but that would collide
with submodule.<name>.<foo> maybe? (Originally I wanted to
postpone this patch until I have parallelized git submodule update,
so a "fetch." prefix may not be good, as we want these 2 operations
to use the same config option I'd guess)





>
>> This moved the responsibility to determine how many parallel processes
>> to start from builtin/fetch to submodule.c as we need a way to communicate
>> "The user did not specify the number of parallel processes in the command
>> line options" in the builtin fetch. The submodule code takes care of
>> the precedence (CLI > config > default)
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  Documentation/config.txt |  6 ++++++
>>  builtin/fetch.c          |  2 +-
>>  submodule.c              | 14 ++++++++++++++
>>  3 files changed, 21 insertions(+), 1 deletion(-)
>>
>>  I just monkey tested the code and it worked once! The problem with testing
>>  this parallelizing option is that the expected behavior doesn't change
>>  except for being faster. And I don't want to add timing tests to the test
>>  suite because they are unreliable.
>>
>>  Any idea how to test this properly?
>
> I agree that a test in t/ would catch bugs in the functionality.  If
> your parallel implementation is somehow broken in the future and
> stops functioning correctly, fetching all submodules with a single
> task and fetching them with N tasks will produce different results
> ;-).
>
> But it would not help you much in seeing if the parallelism is
> really taking place.  Adding t/perf/ tests to show how much benefit
> you are getting may be of more value.
>
> The parallel_process API could learn a new "verbose" feature that it
> by itself shows some messages like
>
>     "processing the 'frotz' job with N tasks"
>     "M tasks finished (N still running)"

I know what to fill in for M and N, 'frotz' is a bit unclear to me.
Would you imagine that to be passed in as a hardcoded string?
git fetch --recurse-submodules would pass in "Fetching submodules",
but <foo> wuld pass in actual "frotz", or would you assume that to be
computed from the task data somehow?

>
> in the output stream from strategic places.  For example, the first
> message will come at the end of pp_init(), and the second message
> will be appended at the end of buffered output of a task that has
> just been finished.  Once you have something like that, you could
> check for them in a test in t/.
>
> Just a thought.

I like that thought. :)

>
>>
>>  This applies on top of sb/submodule-parallel-fetch
>>
>>  Thanks,
>>  Stefan
>>
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 315f271..1172db0 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -1140,6 +1140,12 @@ fetch.recurseSubmodules::
>>       when its superproject retrieves a commit that updates the submodule's
>>       reference.
>>
>> +fetch.recurseSubmoduleParallelism
>> +     This is used to determine how many submodules can be fetched in
>> +     parallel. Specifying a positive integer allows up to that number
>> +     of submodules being fetched in parallel. Specifying 0 the number
>> +     of cpus will be taken as the maximum number.
>> +
>>  fetch.fsckObjects::
>>       If it is set to true, git-fetch-pack will check all fetched
>>       objects. It will abort in the case of a malformed object or a
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index f28eac6..b1399dc 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
>>  static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
>>  static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
>>  static int tags = TAGS_DEFAULT, unshallow, update_shallow;
>> -static int max_children = 1;
>> +static int max_children = -1;
>>  static const char *depth;
>>  static const char *upload_pack;
>>  static struct strbuf default_rla = STRBUF_INIT;
>> diff --git a/submodule.c b/submodule.c
>> index c21b265..c85d3ef 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -15,6 +15,7 @@
>>  #include "thread-utils.h"
>>
>>  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
>> +static int config_fetch_parallel_submodules = -1;
>>  static struct string_list changed_submodule_paths;
>>  static int initialized_fetch_ref_tips;
>>  static struct sha1_array ref_tips_before_fetch;
>> @@ -179,6 +180,14 @@ int submodule_config(const char *var, const char *value, void *cb)
>>       else if (!strcmp(var, "fetch.recursesubmodules")) {
>>               config_fetch_recurse_submodules = parse_fetch_recurse_submodules_arg(var, value);
>>               return 0;
>> +     } else if (!strcmp(var, "fetch.recursesubmoduleparallelism")) {
>> +             char *end;
>> +             int ret;
>> +             config_fetch_parallel_submodules = strtol(value, &end, 0);
>> +             ret = (*end == '\0');
>> +             if (!ret)
>> +                     warning("value for fetch.recurseSubmoduleParallelism not recognized");
>> +             return ret;
>>       }
>>       return 0;
>>  }
>> @@ -759,6 +768,11 @@ int fetch_populated_submodules(const struct argv_array *options,
>>       argv_array_push(&spf.args, "--recurse-submodules-default");
>>       /* default value, "--submodule-prefix" and its value are added later */
>>
>> +     if (max_parallel_jobs < 0)
>> +             max_parallel_jobs = config_fetch_parallel_submodules;
>> +     if (max_parallel_jobs < 0)
>> +             max_parallel_jobs = 1;
>> +
>>       calculate_changed_submodule_paths();
>>       run_processes_parallel(max_parallel_jobs,
>>                              get_next_submodule,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-12 23:31   ` Stefan Beller
@ 2015-10-12 23:50     ` Junio C Hamano
  2015-10-16 17:04       ` Stefan Beller
  2015-10-13  7:32     ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2015-10-12 23:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Heiko Voigt, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> There is core.preloadIndex to enable parallel index preload, but
> that is boolean and not giving fine control to the user. We want to give
> fine control to the user here I'd assume.

I'd approach this as "fetching multiple submodules at a time", if I
were deciding its name.

> ... We could also make it a
> submodule specifc thing (submodule.jobs), but that would collide
> with submodule.<name>.<foo> maybe?

I do not think so.  You can have

	area.attr1
	area.attr3
        area."userthing1".attr1
        area."userthing2".attr1

and the parser can differenciate them just fine.

So if you want

    [submodule]
        fetchParallel = 16
	updateParallel = 4

I do not think that would interfere with any

    [submodule "name"]
    	var = val

You can choose to even allow an attribute that is fundamentally per
"userthing" (e.g. the branch, the remote, the submodule) defined
with area."userthing".attr, but make area.attr to be the fallback
value for unspecified area."userthing9".attr (I think http.*.*
hierarchy takes that approach), but I do not think the parallelism
of fetching is something that should be specified per submodule.

>> The parallel_process API could learn a new "verbose" feature that it
>> by itself shows some messages like
>>
>>     "processing the 'frotz' job with N tasks"
>>     "M tasks finished (N still running)"
>
> I know what to fill in for M and N, 'frotz' is a bit unclear to me.

The caller would pass the label to pp_init(); in this codepath
perhaps it will say 'submodule fetch' or something.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-12 23:31   ` Stefan Beller
  2015-10-12 23:50     ` Junio C Hamano
@ 2015-10-13  7:32     ` Junio C Hamano
  2015-10-13 16:03       ` Stefan Beller
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2015-10-13  7:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Heiko Voigt, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

>> The parallel_process API could learn a new "verbose" feature that it
>> by itself shows some messages like
>>
>>     "processing the 'frotz' job with N tasks"
>>     "M tasks finished (N still running)"
>
> I know what to fill in for M and N, 'frotz' is a bit unclear to me.

At least I don't know what M and N should be, and I'm curious how
you'll define them.  See below.

>> in the output stream from strategic places.  For example, the first
>> message will come at the end of pp_init(), and the second message
>> will be appended at the end of buffered output of a task that has
>> just been finished.  Once you have something like that, you could
>> check for them in a test in t/.
>>
>> Just a thought.
>
> I like that thought. :)


A few more random thoughts:

 * The only thing you could rely on if you were to use the above in
   your tests is the one from pp_init() that declares how many
   processes the machinery is going to use.  M/N will be unstable,
   depending on the scheduling order (e.g. the foreground process
   may take a lot of time to finish, while many other processes
   finish first).

 * Every time the foreground process (i.e. the one whose output is
   tee-ed to the overall output from the machinery) finishes, you
   can emit "M tasks finished (N still running)", but I am not sure
   what M should be.  It is debatable how to account for background
   processes that have already completed but whose output haven't
   been shown.

   One school of thought that is in line with the "pretend as if the
   background tasks are started immediately after the foreground
   task finishes, and they run at infinite speed and produce output
   in no time" idea, on which the "queue output from the background
   processes and emit at once in order to avoid intermixing" design
   was based on, would be not to include them in M (i.e. finished
   ones), because their output haven't been emitted and we are
   pretending that they haven't even been started.  If you take this
   approach, you however may have to include them in N (i.e. still
   running), but that would likely bump N beyond the maximum number
   of simultaneous processes.

   The other school of thought would of course tell the truth and
   include the number of finished background processes in M, as they
   have finished already in the reality.  This will not risk showing
   N that is beyond the maximum, but your first "progress" output
   might say "3 tasks finished", which will make it look odd in a
   different way.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-13  7:32     ` Junio C Hamano
@ 2015-10-13 16:03       ` Stefan Beller
  2015-10-13 21:07         ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Beller @ 2015-10-13 16:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Heiko Voigt, Jens Lehmann

On Tue, Oct 13, 2015 at 12:32 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> The parallel_process API could learn a new "verbose" feature that it
>>> by itself shows some messages like
>>>
>>>     "processing the 'frotz' job with N tasks"
>>>     "M tasks finished (N still running)"
>>
>> I know what to fill in for M and N, 'frotz' is a bit unclear to me.
>
> At least I don't know what M and N should be, and I'm curious how
> you'll define them.  See below.

I presumed the second school of thought. Another alternative there would
be to have 3 numbers:

    "M tasks finished (N still running, K pending output)"

>
>>> in the output stream from strategic places.  For example, the first
>>> message will come at the end of pp_init(), and the second message
>>> will be appended at the end of buffered output of a task that has
>>> just been finished.  Once you have something like that, you could
>>> check for them in a test in t/.
>>>
>>> Just a thought.
>>
>> I like that thought. :)
>
>
> A few more random thoughts:
>
>  * The only thing you could rely on if you were to use the above in
>    your tests is the one from pp_init() that declares how many
>    processes the machinery is going to use.  M/N will be unstable,
>    depending on the scheduling order (e.g. the foreground process
>    may take a lot of time to finish, while many other processes
>    finish first).
>
>  * Every time the foreground process (i.e. the one whose output is
>    tee-ed to the overall output from the machinery) finishes, you
>    can emit "M tasks finished (N still running)", but I am not sure
>    what M should be.  It is debatable how to account for background
>    processes that have already completed but whose output haven't
>    been shown.

Assuming we go with your second school of thought (N are the real
running processes, M including the finished but still pending output tasks),
we may have confusing output, as the output order may confuse the
reader:

    N=8 M=13 (output from live task)
    ...
    N=8 M=12 (output from buffered task)
    ...

Anyone who has no knowledge of the internals, wonders why
M goes *down* ?

>
>    One school of thought that is in line with the "pretend as if the
>    background tasks are started immediately after the foreground
>    task finishes, and they run at infinite speed and produce output
>    in no time" idea, on which the "queue output from the background
>    processes and emit at once in order to avoid intermixing" design
>    was based on, would be not to include them in M (i.e. finished
>    ones), because their output haven't been emitted and we are
>    pretending that they haven't even been started.  If you take this
>    approach, you however may have to include them in N (i.e. still
>    running), but that would likely bump N beyond the maximum number
>    of simultaneous processes.
>
>    The other school of thought would of course tell the truth and
>    include the number of finished background processes in M, as they
>    have finished already in the reality.  This will not risk showing
>    N that is beyond the maximum, but your first "progress" output
>    might say "3 tasks finished", which will make it look odd in a
>    different way.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-13 16:03       ` Stefan Beller
@ 2015-10-13 21:07         ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2015-10-13 21:07 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Heiko Voigt, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> Assuming we go with your second school of thought (N are the real
> running processes, M including the finished but still pending output tasks),

That's neither of my two, I would think.

Anyway, I think it is now clear that it not very easy to say "I know
what to fill in M and N" there ;-).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-12 23:50     ` Junio C Hamano
@ 2015-10-16 17:04       ` Stefan Beller
  2015-10-16 17:26         ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Beller @ 2015-10-16 17:04 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Nieder; +Cc: git, Heiko Voigt, Jens Lehmann

On Mon, Oct 12, 2015 at 4:50 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> There is core.preloadIndex to enable parallel index preload, but
>> that is boolean and not giving fine control to the user. We want to give
>> fine control to the user here I'd assume.
>
> I'd approach this as "fetching multiple submodules at a time", if I
> were deciding its name.
>

so maybe
    fetch.recurseSubmoduleJobs
    fetch.submoduleJobs
    fetch.jobs
    fetch.connectionsToUse

Eventually we want to also be parallel in git fetch --all, when using
the latter two
we could reuse these then too, no need to support different options for
fetch --all and fetch --recurseSubmodules.


> So if you want
>
>     [submodule]
>         fetchParallel = 16
>         updateParallel = 4

So you would have different settings here for only slightly different things?
So the series I sent out yesterday evening, would make use of updateParallel
for parallel cloning then instead?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-16 17:04       ` Stefan Beller
@ 2015-10-16 17:26         ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2015-10-16 17:26 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git, Heiko Voigt, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> so maybe
>     fetch.recurseSubmoduleJobs
>     fetch.submoduleJobs
>     fetch.jobs
>     fetch.connectionsToUse

"git remote update" is another example that may want to run multiple
independent 'git fetch' in parallel.  I think "When the operation I
ask involves fetching from multiple places, I want N instances of
them to be executed", regardless of the kind of operation ("remote
update" or "submodule update", etc.), would match the end-user's
point of view the best, if you want to give them "set this single
thing to apply to all of them" fallback default.

If you want to give them a finer-grained control, you would need to
differentiate what kind of fetch would use N tasks (as opposed to
other kind of fetch that uses M tasks) and the name would need to
have "submodule" in it for that differentiation.

>> So if you want
>>
>>     [submodule]
>>         fetchParallel = 16
>>         updateParallel = 4
>
> So you would have different settings here for only slightly different things?

I was just showing you that it is _possible_ if you want to give
finer control.  For example, you can define:

 * 'submodule.parallel', if defined gives the values for the
   following more specific ones if they aren't given.

 * 'submodule.fetchParallel' specifies how many tasks are run in
   'fetch --recurse-submodules'.

 * 'submodule.fetchParallel' specifies how many tasks are run in
   'submodule update'.

so that those who want finer controls can, and those who don't can
set a single one to apply to all.

If you want to start with a globally single setting, that is
perfectly fine.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-16 17:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-12 22:52 [PATCH] Add fetch.recurseSubmoduleParallelism config option Stefan Beller
2015-10-12 23:14 ` Junio C Hamano
2015-10-12 23:31   ` Stefan Beller
2015-10-12 23:50     ` Junio C Hamano
2015-10-16 17:04       ` Stefan Beller
2015-10-16 17:26         ` Junio C Hamano
2015-10-13  7:32     ` Junio C Hamano
2015-10-13 16:03       ` Stefan Beller
2015-10-13 21:07         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.