[PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7
@ 2015-12-16  0:04 Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 1/7] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

I am sending out a new version for replacing sb/submodule-parallel-fetch for
the time after the 2.7 release.

* Dropped the patch, which introduces xread_nonblock
* strbuf_read_once uses xread now. This is safe as we poll before using
  strbuf_read_once, so we know we won't stall.
* have the commit message reworded for "run-command: add an asynchronous parallel child processor"
  with Johannes' suggestion.
  
  Thanks,
  Stefan

Jonathan Nieder (1):
  submodule.c: write "Fetching submodule <foo>" to stderr

Stefan Beller (6):
  xread: poll on non blocking fds
  strbuf: add strbuf_read_once to read without blocking
  sigchain: add command to pop all common signals
  run-command: add an asynchronous parallel child processor
  fetch_populated_submodules: use new parallel job processing
  submodules: allow parallel fetching, add tests and documentation

 Documentation/fetch-options.txt |   7 +
 builtin/fetch.c                 |   6 +-
 builtin/pull.c                  |   6 +
 run-command.c                   | 335 ++++++++++++++++++++++++++++++++++++++++
 run-command.h                   |  80 ++++++++++
 sigchain.c                      |   9 ++
 sigchain.h                      |   1 +
 strbuf.c                        |  11 ++
 strbuf.h                        |   8 +
 submodule.c                     | 141 +++++++++++------
 submodule.h                     |   2 +-
 t/t0061-run-command.sh          |  53 +++++++
 t/t5526-fetch-submodules.sh     |  71 ++++++---
 test-run-command.c              |  55 ++++++-
 wrapper.c                       |  20 ++-
 15 files changed, 731 insertions(+), 74 deletions(-)

-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCHv2 1/7] submodule.c: write "Fetching submodule <foo>" to stderr
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 2/7] xread: poll on non blocking fds Stefan Beller
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

From: Jonathan Nieder <jrnieder@gmail.com>

The "Pushing submodule <foo>" progress output correctly goes to
stderr, but "Fetching submodule <foo>" is going to stdout by
mistake.  Fix it to write to stderr.

Noticed while trying to implement a parallel submodule fetch.  When
this particular output line went to a different file descriptor, it
was buffered separately, resulting in wrongly interleaved output if
we copied it to the terminal naively.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c                 |  2 +-
 t/t5526-fetch-submodules.sh | 51 +++++++++++++++++++++++----------------------
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/submodule.c b/submodule.c
index 14e7624..8386477 100644
--- a/submodule.c
+++ b/submodule.c
@@ -689,7 +689,7 @@ int fetch_populated_submodules(const struct argv_array *options,
 			git_dir = submodule_git_dir.buf;
 		if (is_directory(git_dir)) {
 			if (!quiet)
-				printf("Fetching submodule %s%s\n", prefix, ce->name);
+				fprintf(stderr, "Fetching submodule %s%s\n", prefix, ce->name);
 			cp.dir = submodule_path.buf;
 			argv_array_push(&argv, default_argv);
 			argv_array_push(&argv, "--submodule-prefix");
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index a4532b0..17759b14 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -16,7 +16,8 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "From $pwd/submodule" > ../expect.err &&
+		echo "Fetching submodule submodule" > ../expect.err &&
+		echo "From $pwd/submodule" >> ../expect.err &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err
 	) &&
 	(
@@ -27,6 +28,7 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
+		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
 		echo "From $pwd/deepsubmodule" >> ../expect.err &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err
 	)
@@ -56,9 +58,7 @@ test_expect_success setup '
 	(
 		cd downstream &&
 		git submodule update --init --recursive
-	) &&
-	echo "Fetching submodule submodule" > expect.out &&
-	echo "Fetching submodule submodule/subdir/deepsubmodule" >> expect.out
+	)
 '
 
 test_expect_success "fetch --recurse-submodules recurses into submodules" '
@@ -67,7 +67,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -96,7 +96,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git config -f .gitmodules submodule.submodule.fetchRecurseSubmodules true &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -127,7 +127,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset -f .gitmodules submodule.submodule.fetchRecurseSubmodules &&
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -146,7 +146,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		cd downstream &&
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -155,7 +155,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -166,7 +166,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git config fetch.recurseSubmodules true
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -180,7 +180,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		) &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -214,16 +214,15 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "Fetching submodule submodule" > expect.out.sub &&
 	echo "From $pwd/." > expect.err.sub &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err.sub &&
-	head -2 expect.err >> expect.err.sub &&
+	head -3 expect.err >> expect.err.sub &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_i18ncmp expect.err.sub actual.err &&
-	test_i18ncmp expect.out.sub actual.out
+	test_must_be_empty actual.out
 '
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
@@ -269,7 +268,7 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	test_i18ncmp expect.err.sub actual.err &&
-	test_i18ncmp expect.out actual.out
+	test_must_be_empty actual.out
 '
 
 test_expect_success "Recursion picks up all submodules when necessary" '
@@ -285,7 +284,8 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule"
 		head2=$(git rev-parse --short HEAD) &&
-		echo "From $pwd/submodule" > ../expect.err.sub &&
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err.sub
 	) &&
 	head1=$(git rev-parse --short HEAD) &&
@@ -295,13 +295,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	echo "From $pwd/." > expect.err.2 &&
 	echo "   $head1..$head2  master     -> origin/master" >> expect.err.2 &&
 	cat expect.err.sub >> expect.err.2 &&
-	tail -2 expect.err >> expect.err.2 &&
+	tail -3 expect.err >> expect.err.2 &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_i18ncmp expect.err.2 actual.err &&
-	test_i18ncmp expect.out actual.out
+	test_must_be_empty actual.out
 '
 
 test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
@@ -317,7 +317,8 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "From $pwd/submodule" > ../expect.err.sub &&
+		echo Fetching submodule submodule > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err.sub
 	) &&
 	(
@@ -335,7 +336,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -2 expect.err > expect.err.deepsub &&
+	tail -3 expect.err > expect.err.deepsub &&
 	echo "From $pwd/." > expect.err &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err &&
 	cat expect.err.sub >> expect.err &&
@@ -354,7 +355,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 			git config --unset -f .gitmodules submodule.subdir/deepsubmodule.fetchRecursive
 		)
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -388,7 +389,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	head2=$(git rev-parse --short HEAD) &&
 	echo "From $pwd/." > expect.err.2 &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err.2 &&
-	head -2 expect.err >> expect.err.2 &&
+	head -3 expect.err >> expect.err.2 &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -399,7 +400,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		cd downstream &&
 		git config --unset fetch.recurseSubmodules
 	) &&
-	test_i18ncmp expect.out.sub actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err.2 actual.err
 '
 
@@ -416,7 +417,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	head2=$(git rev-parse --short HEAD) &&
 	echo "From $pwd/." > expect.err.2 &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err.2 &&
-	head -2 expect.err >> expect.err.2 &&
+	head -3 expect.err >> expect.err.2 &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -427,7 +428,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		cd downstream &&
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
-	test_i18ncmp expect.out.sub actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err.2 actual.err
 '
 
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 1/7] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-17 20:12   ` Torsten Bögershausen
  2015-12-18  3:07   ` Jeff King
  2015-12-16  0:04 ` [PATCHv2 3/7] strbuf: add strbuf_read_once to read without blocking Stefan Beller
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

The man page of read(2) says:

  EAGAIN The file descriptor fd refers to a file other than a socket
	 and has been marked nonblocking (O_NONBLOCK), and the read
	 would block.

  EAGAIN or EWOULDBLOCK
	 The file descriptor fd refers to a socket and has been marked
	 nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
	 allows either error to be returned for this case, and does not
	 require these constants to have the same value, so a portable
	 application should check for both possibilities.

If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
As the intent of xread is to read as much as possible either until the
fd is EOF or an actual error occurs, we can ease the feeder of the fd
by not spinning the whole time, but rather wait for it politely by not
busy waiting.

We should not care if the call to poll failed, as we're in an infinite
loop and can only get out with the correct read().

Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 wrapper.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 6fcaa4d..1770efa 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -236,8 +236,24 @@ ssize_t xread(int fd, void *buf, size_t len)
 	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = read(fd, buf, len);
-		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-			continue;
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			if (errno == EAGAIN || errno == EWOULDBLOCK) {
+				struct pollfd pfd;
+				pfd.events = POLLIN;
+				pfd.fd = fd;
+				/*
+				 * it is OK if this poll() failed; we
+				 * want to leave this infinite loop
+				 * only when read() returns with
+				 * success, or an expected failure,
+				 * which would be checked by the next
+				 * call to read(2).
+				 */
+				poll(&pfd, 1, -1);
+			}
+		}
 		return nr;
 	}
 }
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 3/7] strbuf: add strbuf_read_once to read without blocking
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 1/7] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 2/7] xread: poll on non blocking fds Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 4/7] sigchain: add command to pop all common signals Stefan Beller
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

The new call will read from a file descriptor into a strbuf once. The
underlying call xread is just run once. xread only reattempts
reading in case of EINTR, which makes it suitable to use for a
nonblocking read.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 strbuf.c | 11 +++++++++++
 strbuf.h |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index d76f0ae..38686ff 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -384,6 +384,17 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 	return sb->len - oldlen;
 }
 
+ssize_t strbuf_read_once(struct strbuf *sb, int fd, size_t hint)
+{
+	ssize_t cnt;
+
+	strbuf_grow(sb, hint ? hint : 8192);
+	cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
+	if (cnt > 0)
+		strbuf_setlen(sb, sb->len + cnt);
+	return cnt;
+}
+
 #define STRBUF_MAXLINK (2*PATH_MAX)
 
 int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint)
diff --git a/strbuf.h b/strbuf.h
index 7123fca..2bf90e7 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,14 @@ extern size_t strbuf_fread(struct strbuf *, size_t, FILE *);
 extern ssize_t strbuf_read(struct strbuf *, int fd, size_t hint);
 
 /**
+ * Read the contents of a given file descriptor partially by using only one
+ * attempt of xread. The third argument can be used to give a hint about the
+ * file size, to avoid reallocs. Returns the number of new bytes appended to
+ * the sb.
+ */
+extern ssize_t strbuf_read_once(struct strbuf *, int fd, size_t hint);
+
+/**
  * Read the contents of a file, specified by its path. The third argument
  * can be used to give a hint about the file size, to avoid reallocs.
  */
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 4/7] sigchain: add command to pop all common signals
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (2 preceding siblings ...)
  2015-12-16  0:04 ` [PATCHv2 3/7] strbuf: add strbuf_read_once to read without blocking Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 5/7] run-command: add an asynchronous parallel child processor Stefan Beller
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

The new method removes all common signal handlers that were installed
by sigchain_push.

CC: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sigchain.c | 9 +++++++++
 sigchain.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/sigchain.c b/sigchain.c
index faa375d..2ac43bb 100644
--- a/sigchain.c
+++ b/sigchain.c
@@ -50,3 +50,12 @@ void sigchain_push_common(sigchain_fun f)
 	sigchain_push(SIGQUIT, f);
 	sigchain_push(SIGPIPE, f);
 }
+
+void sigchain_pop_common(void)
+{
+	sigchain_pop(SIGPIPE);
+	sigchain_pop(SIGQUIT);
+	sigchain_pop(SIGTERM);
+	sigchain_pop(SIGHUP);
+	sigchain_pop(SIGINT);
+}
diff --git a/sigchain.h b/sigchain.h
index 618083b..138b20f 100644
--- a/sigchain.h
+++ b/sigchain.h
@@ -7,5 +7,6 @@ int sigchain_push(int sig, sigchain_fun f);
 int sigchain_pop(int sig);
 
 void sigchain_push_common(sigchain_fun f);
+void sigchain_pop_common(void);
 
 #endif /* SIGCHAIN_H */
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 5/7] run-command: add an asynchronous parallel child processor
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (3 preceding siblings ...)
  2015-12-16  0:04 ` [PATCHv2 4/7] sigchain: add command to pop all common signals Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 6/7] fetch_populated_submodules: use new parallel job processing Stefan Beller
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

This allows to run external commands in parallel with ordered output
on stderr.

If we run external commands in parallel we cannot pipe the output directly
to the our stdout/err as it would mix up. So each process's output will
flow through a pipe, which we buffer. One subprocess can be directly
piped to out stdout/err for a low latency feedback to the user.

Example:
Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a
different amount of time as the different submodules vary in size, then
the output of fetches in sequential order might look like this:

 time -->
 output: |---A---| |-B-| |-------C-------| |-D-| |-E-|

When we schedule these submodules into maximal two parallel processes,
a schedule and sample output over time may look like this:

process 1: |---A---| |-D-| |-E-|

process 2: |-B-| |-------C-------|

output:    |---A---|B|---C-------|DE

So A will be perceived as it would run normally in the single child
version. As B has finished by the time A is done, we can dump its whole
progress buffer on stderr, such that it looks like it finished in no
time. Once that is done, C is determined to be the visible child and
its progress will be reported in real time.

So this way of output is really good for human consumption, as it only
changes the timing, not the actual output.

For machine consumption the output needs to be prepared in the tasks,
by either having a prefix per line or per block to indicate whose tasks
output is displayed, because the output order may not follow the
original sequential ordering:

 |----A----| |--B--| |-C-|

will be scheduled to be all parallel:

process 1: |----A----|
process 2: |--B--|
process 3: |-C-|
output:    |----A----|CB

This happens because C finished before B did, so it will be queued for
output before B.

To detect when a child has finished executing, we check interleaved
with other actions (such as checking the liveliness of children or
starting new processes) whether the stderr pipe still exists. Once a
child closed its stderr stream, we assume it is terminating very soon,
and use `finish_command()` from the single external process execution
interface to collect the exit status.

By maintaining the strong assumption of stderr being open until the
very end of a child process, we can avoid other hassle such as an
implementation using `waitpid(-1)`, which is not implemented in Windows.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c          | 335 +++++++++++++++++++++++++++++++++++++++++++++++++
 run-command.h          |  80 ++++++++++++
 t/t0061-run-command.sh |  53 ++++++++
 test-run-command.c     |  55 +++++++-
 4 files changed, 522 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index 13fa452..51fd72c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -3,6 +3,8 @@
 #include "exec_cmd.h"
 #include "sigchain.h"
 #include "argv-array.h"
+#include "thread-utils.h"
+#include "strbuf.h"
 
 void child_process_init(struct child_process *child)
 {
@@ -865,3 +867,336 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
 	close(cmd->out);
 	return finish_command(cmd);
 }
+
+enum child_state {
+	GIT_CP_FREE,
+	GIT_CP_WORKING,
+	GIT_CP_WAIT_CLEANUP,
+};
+
+struct parallel_processes {
+	void *data;
+
+	int max_processes;
+	int nr_processes;
+
+	get_next_task_fn get_next_task;
+	start_failure_fn start_failure;
+	task_finished_fn task_finished;
+
+	struct {
+		enum child_state state;
+		struct child_process process;
+		struct strbuf err;
+		void *data;
+	} *children;
+	/*
+	 * The struct pollfd is logically part of *children,
+	 * but the system call expects it as its own array.
+	 */
+	struct pollfd *pfd;
+
+	unsigned shutdown : 1;
+
+	int output_owner;
+	struct strbuf buffered_output; /* of finished children */
+};
+
+static int default_start_failure(struct child_process *cp,
+				 struct strbuf *err,
+				 void *pp_cb,
+				 void *pp_task_cb)
+{
+	int i;
+
+	strbuf_addstr(err, "Starting a child failed:");
+	for (i = 0; cp->argv[i]; i++)
+		strbuf_addf(err, " %s", cp->argv[i]);
+
+	return 0;
+}
+
+static int default_task_finished(int result,
+				 struct child_process *cp,
+				 struct strbuf *err,
+				 void *pp_cb,
+				 void *pp_task_cb)
+{
+	int i;
+
+	if (!result)
+		return 0;
+
+	strbuf_addf(err, "A child failed with return code %d:", result);
+	for (i = 0; cp->argv[i]; i++)
+		strbuf_addf(err, " %s", cp->argv[i]);
+
+	return 0;
+}
+
+static void kill_children(struct parallel_processes *pp, int signo)
+{
+	int i, n = pp->max_processes;
+
+	for (i = 0; i < n; i++)
+		if (pp->children[i].state == GIT_CP_WORKING)
+			kill(pp->children[i].process.pid, signo);
+}
+
+static struct parallel_processes *pp_for_signal;
+
+static void handle_children_on_signal(int signo)
+{
+	kill_children(pp_for_signal, signo);
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void pp_init(struct parallel_processes *pp,
+		    int n,
+		    get_next_task_fn get_next_task,
+		    start_failure_fn start_failure,
+		    task_finished_fn task_finished,
+		    void *data)
+{
+	int i;
+
+	if (n < 1)
+		n = online_cpus();
+
+	pp->max_processes = n;
+
+	trace_printf("run_processes_parallel: preparing to run up to %d tasks", n);
+
+	pp->data = data;
+	if (!get_next_task)
+		die("BUG: you need to specify a get_next_task function");
+	pp->get_next_task = get_next_task;
+
+	pp->start_failure = start_failure ? start_failure : default_start_failure;
+	pp->task_finished = task_finished ? task_finished : default_task_finished;
+
+	pp->nr_processes = 0;
+	pp->output_owner = 0;
+	pp->shutdown = 0;
+	pp->children = xcalloc(n, sizeof(*pp->children));
+	pp->pfd = xcalloc(n, sizeof(*pp->pfd));
+	strbuf_init(&pp->buffered_output, 0);
+
+	for (i = 0; i < n; i++) {
+		strbuf_init(&pp->children[i].err, 0);
+		child_process_init(&pp->children[i].process);
+		pp->pfd[i].events = POLLIN | POLLHUP;
+		pp->pfd[i].fd = -1;
+	}
+
+	pp_for_signal = pp;
+	sigchain_push_common(handle_children_on_signal);
+}
+
+static void pp_cleanup(struct parallel_processes *pp)
+{
+	int i;
+
+	trace_printf("run_processes_parallel: done");
+	for (i = 0; i < pp->max_processes; i++) {
+		strbuf_release(&pp->children[i].err);
+		child_process_clear(&pp->children[i].process);
+	}
+
+	free(pp->children);
+	free(pp->pfd);
+
+	/*
+	 * When get_next_task added messages to the buffer in its last
+	 * iteration, the buffered output is non empty.
+	 */
+	fputs(pp->buffered_output.buf, stderr);
+	strbuf_release(&pp->buffered_output);
+
+	sigchain_pop_common();
+}
+
+/* returns
+ *  0 if a new task was started.
+ *  1 if no new jobs was started (get_next_task ran out of work, non critical
+ *    problem with starting a new command)
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *    to signal the children.
+ */
+static int pp_start_one(struct parallel_processes *pp)
+{
+	int i, code;
+
+	for (i = 0; i < pp->max_processes; i++)
+		if (pp->children[i].state == GIT_CP_FREE)
+			break;
+	if (i == pp->max_processes)
+		die("BUG: bookkeeping is hard");
+
+	code = pp->get_next_task(&pp->children[i].process,
+				 &pp->children[i].err,
+				 pp->data,
+				 &pp->children[i].data);
+	if (!code) {
+		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+		strbuf_reset(&pp->children[i].err);
+		return 1;
+	}
+	pp->children[i].process.err = -1;
+	pp->children[i].process.stdout_to_stderr = 1;
+	pp->children[i].process.no_stdin = 1;
+
+	if (start_command(&pp->children[i].process)) {
+		code = pp->start_failure(&pp->children[i].process,
+					 &pp->children[i].err,
+					 pp->data,
+					 &pp->children[i].data);
+		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+		strbuf_reset(&pp->children[i].err);
+		if (code)
+			pp->shutdown = 1;
+		return code;
+	}
+
+	pp->nr_processes++;
+	pp->children[i].state = GIT_CP_WORKING;
+	pp->pfd[i].fd = pp->children[i].process.err;
+	return 0;
+}
+
+static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
+{
+	int i;
+
+	while ((i = poll(pp->pfd, pp->max_processes, output_timeout)) < 0) {
+		if (errno == EINTR)
+			continue;
+		pp_cleanup(pp);
+		die_errno("poll");
+	}
+
+	/* Buffer output from all pipes. */
+	for (i = 0; i < pp->max_processes; i++) {
+		if (pp->children[i].state == GIT_CP_WORKING &&
+		    pp->pfd[i].revents & (POLLIN | POLLHUP)) {
+			int n = strbuf_read_once(&pp->children[i].err,
+						 pp->children[i].process.err, 0);
+			if (n == 0) {
+				close(pp->children[i].process.err);
+				pp->children[i].state = GIT_CP_WAIT_CLEANUP;
+			} else if (n < 0)
+				if (errno != EAGAIN)
+					die_errno("read");
+		}
+	}
+}
+
+static void pp_output(struct parallel_processes *pp)
+{
+	int i = pp->output_owner;
+	if (pp->children[i].state == GIT_CP_WORKING &&
+	    pp->children[i].err.len) {
+		fputs(pp->children[i].err.buf, stderr);
+		strbuf_reset(&pp->children[i].err);
+	}
+}
+
+static int pp_collect_finished(struct parallel_processes *pp)
+{
+	int i, code;
+	int n = pp->max_processes;
+	int result = 0;
+
+	while (pp->nr_processes > 0) {
+		for (i = 0; i < pp->max_processes; i++)
+			if (pp->children[i].state == GIT_CP_WAIT_CLEANUP)
+				break;
+		if (i == pp->max_processes)
+			break;
+
+		code = finish_command(&pp->children[i].process);
+
+		code = pp->task_finished(code, &pp->children[i].process,
+					 &pp->children[i].err, pp->data,
+					 &pp->children[i].data);
+
+		if (code)
+			result = code;
+		if (code < 0)
+			break;
+
+		pp->nr_processes--;
+		pp->children[i].state = GIT_CP_FREE;
+		pp->pfd[i].fd = -1;
+		child_process_init(&pp->children[i].process);
+
+		if (i != pp->output_owner) {
+			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+			strbuf_reset(&pp->children[i].err);
+		} else {
+			fputs(pp->children[i].err.buf, stderr);
+			strbuf_reset(&pp->children[i].err);
+
+			/* Output all other finished child processes */
+			fputs(pp->buffered_output.buf, stderr);
+			strbuf_reset(&pp->buffered_output);
+
+			/*
+			 * Pick next process to output live.
+			 * NEEDSWORK:
+			 * For now we pick it randomly by doing a round
+			 * robin. Later we may want to pick the one with
+			 * the most output or the longest or shortest
+			 * running process time.
+			 */
+			for (i = 0; i < n; i++)
+				if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
+					break;
+			pp->output_owner = (pp->output_owner + i) % n;
+		}
+	}
+	return result;
+}
+
+int run_processes_parallel(int n,
+			   get_next_task_fn get_next_task,
+			   start_failure_fn start_failure,
+			   task_finished_fn task_finished,
+			   void *pp_cb)
+{
+	int i, code;
+	int output_timeout = 100;
+	int spawn_cap = 4;
+	struct parallel_processes pp;
+
+	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb);
+	while (1) {
+		for (i = 0;
+		    i < spawn_cap && !pp.shutdown &&
+		    pp.nr_processes < pp.max_processes;
+		    i++) {
+			code = pp_start_one(&pp);
+			if (!code)
+				continue;
+			if (code < 0) {
+				pp.shutdown = 1;
+				kill_children(&pp, -code);
+			}
+			break;
+		}
+		if (!pp.nr_processes)
+			break;
+		pp_buffer_stderr(&pp, output_timeout);
+		pp_output(&pp);
+		code = pp_collect_finished(&pp);
+		if (code) {
+			pp.shutdown = 1;
+			if (code < 0)
+				kill_children(&pp, -code);
+		}
+	}
+
+	pp_cleanup(&pp);
+	return 0;
+}
diff --git a/run-command.h b/run-command.h
index 12bb26c..d5a57f9 100644
--- a/run-command.h
+++ b/run-command.h
@@ -122,4 +122,84 @@ int start_async(struct async *async);
 int finish_async(struct async *async);
 int in_async(void);
 
+/**
+ * This callback should initialize the child process and preload the
+ * error channel if desired. The preloading of is useful if you want to
+ * have a message printed directly before the output of the child process.
+ * pp_cb is the callback cookie as passed to run_processes_parallel.
+ * You can store a child process specific callback cookie in pp_task_cb.
+ *
+ * Even after returning 0 to indicate that there are no more processes,
+ * this function will be called again until there are no more running
+ * child processes.
+ *
+ * Return 1 if the next child is ready to run.
+ * Return 0 if there are currently no more tasks to be processed.
+ * To send a signal to other child processes for abortion,
+ * return the negative signal number.
+ */
+typedef int (*get_next_task_fn)(struct child_process *cp,
+				struct strbuf *err,
+				void *pp_cb,
+				void **pp_task_cb);
+
+/**
+ * This callback is called whenever there are problems starting
+ * a new process.
+ *
+ * You must not write to stdout or stderr in this function. Add your
+ * message to the strbuf err instead, which will be printed without
+ * messing up the output of the other parallel processes.
+ *
+ * pp_cb is the callback cookie as passed into run_processes_parallel,
+ * pp_task_cb is the callback cookie as passed into get_next_task_fn.
+ *
+ * Return 0 to continue the parallel processing. To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * the negative signal number.
+ */
+typedef int (*start_failure_fn)(struct child_process *cp,
+				struct strbuf *err,
+				void *pp_cb,
+				void *pp_task_cb);
+
+/**
+ * This callback is called on every child process that finished processing.
+ *
+ * You must not write to stdout or stderr in this function. Add your
+ * message to the strbuf err instead, which will be printed without
+ * messing up the output of the other parallel processes.
+ *
+ * pp_cb is the callback cookie as passed into run_processes_parallel,
+ * pp_task_cb is the callback cookie as passed into get_next_task_fn.
+ *
+ * Return 0 to continue the parallel processing.  To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * the negative signal number.
+ */
+typedef int (*task_finished_fn)(int result,
+				struct child_process *cp,
+				struct strbuf *err,
+				void *pp_cb,
+				void *pp_task_cb);
+
+/**
+ * Runs up to n processes at the same time. Whenever a process can be
+ * started, the callback get_next_task_fn is called to obtain the data
+ * required to start another child process.
+ *
+ * The children started via this function run in parallel. Their output
+ * (both stdout and stderr) is routed to stderr in a manner that output
+ * from different tasks does not interleave.
+ *
+ * If start_failure_fn or task_finished_fn are NULL, default handlers
+ * will be used. The default handlers will print an error message on
+ * error without issuing an emergency stop.
+ */
+int run_processes_parallel(int n,
+			   get_next_task_fn,
+			   start_failure_fn,
+			   task_finished_fn,
+			   void *pp_cb);
+
 #endif
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 9acf628..12228b4 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -47,4 +47,57 @@ test_expect_success POSIXPERM,SANITY 'unreadable directory in PATH' '
 	test_cmp expect actual
 '
 
+cat >expect <<-EOF
+preloaded output of a child
+Hello
+World
+preloaded output of a child
+Hello
+World
+preloaded output of a child
+Hello
+World
+preloaded output of a child
+Hello
+World
+EOF
+
+test_expect_success 'run_command runs in parallel with more jobs available than tasks' '
+	test-run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
+	test-run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
+	test-run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<-EOF
+preloaded output of a child
+asking for a quick stop
+preloaded output of a child
+asking for a quick stop
+preloaded output of a child
+asking for a quick stop
+EOF
+
+test_expect_success 'run_command is asked to abort gracefully' '
+	test-run-command run-command-abort 3 false 2>actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<-EOF
+no further jobs available
+EOF
+
+test_expect_success 'run_command outputs ' '
+	test-run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/test-run-command.c b/test-run-command.c
index 89c7de2..f964507 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -10,16 +10,54 @@
 
 #include "git-compat-util.h"
 #include "run-command.h"
+#include "argv-array.h"
+#include "strbuf.h"
 #include <string.h>
 #include <errno.h>
 
+static int number_callbacks;
+static int parallel_next(struct child_process *cp,
+			 struct strbuf *err,
+			 void *cb,
+			 void** task_cb)
+{
+	struct child_process *d = cb;
+	if (number_callbacks >= 4)
+		return 0;
+
+	argv_array_pushv(&cp->args, d->argv);
+	strbuf_addf(err, "preloaded output of a child\n");
+	number_callbacks++;
+	return 1;
+}
+
+static int no_job(struct child_process *cp,
+		  struct strbuf *err,
+		  void *cb,
+		  void** task_cb)
+{
+	strbuf_addf(err, "no further jobs available\n");
+	return 0;
+}
+
+static int task_finished(int result,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	strbuf_addf(err, "asking for a quick stop\n");
+	return 1;
+}
+
 int main(int argc, char **argv)
 {
 	struct child_process proc = CHILD_PROCESS_INIT;
+	int jobs;
 
 	if (argc < 3)
 		return 1;
-	proc.argv = (const char **)argv+2;
+	proc.argv = (const char **)argv + 2;
 
 	if (!strcmp(argv[1], "start-command-ENOENT")) {
 		if (start_command(&proc) < 0 && errno == ENOENT)
@@ -30,6 +68,21 @@ int main(int argc, char **argv)
 	if (!strcmp(argv[1], "run-command"))
 		exit(run_command(&proc));
 
+	jobs = atoi(argv[2]);
+	proc.argv = (const char **)argv + 3;
+
+	if (!strcmp(argv[1], "run-command-parallel"))
+		exit(run_processes_parallel(jobs, parallel_next,
+					    NULL, NULL, &proc));
+
+	if (!strcmp(argv[1], "run-command-abort"))
+		exit(run_processes_parallel(jobs, parallel_next,
+					    NULL, task_finished, &proc));
+
+	if (!strcmp(argv[1], "run-command-no-jobs"))
+		exit(run_processes_parallel(jobs, no_job,
+					    NULL, task_finished, &proc));
+
 	fprintf(stderr, "check usage\n");
 	return 1;
 }
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 6/7] fetch_populated_submodules: use new parallel job processing
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (4 preceding siblings ...)
  2015-12-16  0:04 ` [PATCHv2 5/7] run-command: add an asynchronous parallel child processor Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-16  0:04 ` [PATCHv2 7/7] submodules: allow parallel fetching, add tests and documentation Stefan Beller
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

In a later patch we enable parallel processing of submodules, this
only adds the possibility for it. So this change should not change
any user facing behavior.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 142 +++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 98 insertions(+), 44 deletions(-)

diff --git a/submodule.c b/submodule.c
index 8386477..6a2d786 100644
--- a/submodule.c
+++ b/submodule.c
@@ -12,6 +12,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "blob.h"
+#include "thread-utils.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static struct string_list changed_submodule_paths;
@@ -610,37 +611,28 @@ static void calculate_changed_submodule_paths(void)
 	initialized_fetch_ref_tips = 0;
 }
 
-int fetch_populated_submodules(const struct argv_array *options,
-			       const char *prefix, int command_line_option,
-			       int quiet)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	const char *work_tree;
+	const char *prefix;
+	int command_line_option;
+	int quiet;
+	int result;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0}
+
+static int get_next_submodule(struct child_process *cp,
+			      struct strbuf *err, void *data, void **task_cb)
 {
-	int i, result = 0;
-	struct child_process cp = CHILD_PROCESS_INIT;
-	struct argv_array argv = ARGV_ARRAY_INIT;
-	const char *work_tree = get_git_work_tree();
-	if (!work_tree)
-		goto out;
-
-	if (read_cache() < 0)
-		die("index file corrupt");
-
-	argv_array_push(&argv, "fetch");
-	for (i = 0; i < options->argc; i++)
-		argv_array_push(&argv, options->argv[i]);
-	argv_array_push(&argv, "--recurse-submodules-default");
-	/* default value, "--submodule-prefix" and its value are added later */
-
-	cp.env = local_repo_env;
-	cp.git_cmd = 1;
-	cp.no_stdin = 1;
-
-	calculate_changed_submodule_paths();
+	int ret = 0;
+	struct submodule_parallel_fetch *spf = data;
 
-	for (i = 0; i < active_nr; i++) {
+	for ( ; spf->count < active_nr; spf->count++) {
 		struct strbuf submodule_path = STRBUF_INIT;
 		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
-		const struct cache_entry *ce = active_cache[i];
+		const struct cache_entry *ce = active_cache[spf->count];
 		const char *git_dir, *default_argv;
 		const struct submodule *submodule;
 
@@ -652,7 +644,7 @@ int fetch_populated_submodules(const struct argv_array *options,
 			submodule = submodule_from_name(null_sha1, ce->name);
 
 		default_argv = "yes";
-		if (command_line_option == RECURSE_SUBMODULES_DEFAULT) {
+		if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
 			if (submodule &&
 			    submodule->fetch_recurse !=
 						RECURSE_SUBMODULES_NONE) {
@@ -675,40 +667,102 @@ int fetch_populated_submodules(const struct argv_array *options,
 					default_argv = "on-demand";
 				}
 			}
-		} else if (command_line_option == RECURSE_SUBMODULES_ON_DEMAND) {
+		} else if (spf->command_line_option == RECURSE_SUBMODULES_ON_DEMAND) {
 			if (!unsorted_string_list_lookup(&changed_submodule_paths, ce->name))
 				continue;
 			default_argv = "on-demand";
 		}
 
-		strbuf_addf(&submodule_path, "%s/%s", work_tree, ce->name);
+		strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
 		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
-		strbuf_addf(&submodule_prefix, "%s%s/", prefix, ce->name);
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
 		git_dir = read_gitfile(submodule_git_dir.buf);
 		if (!git_dir)
 			git_dir = submodule_git_dir.buf;
 		if (is_directory(git_dir)) {
-			if (!quiet)
-				fprintf(stderr, "Fetching submodule %s%s\n", prefix, ce->name);
-			cp.dir = submodule_path.buf;
-			argv_array_push(&argv, default_argv);
-			argv_array_push(&argv, "--submodule-prefix");
-			argv_array_push(&argv, submodule_prefix.buf);
-			cp.argv = argv.argv;
-			if (run_command(&cp))
-				result = 1;
-			argv_array_pop(&argv);
-			argv_array_pop(&argv);
-			argv_array_pop(&argv);
+			child_process_init(cp);
+			cp->dir = strbuf_detach(&submodule_path, NULL);
+			cp->env = local_repo_env;
+			cp->git_cmd = 1;
+			if (!spf->quiet)
+				strbuf_addf(err, "Fetching submodule %s%s\n",
+					    spf->prefix, ce->name);
+			argv_array_init(&cp->args);
+			argv_array_pushv(&cp->args, spf->args.argv);
+			argv_array_push(&cp->args, default_argv);
+			argv_array_push(&cp->args, "--submodule-prefix");
+			argv_array_push(&cp->args, submodule_prefix.buf);
+			ret = 1;
 		}
 		strbuf_release(&submodule_path);
 		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
+		if (ret) {
+			spf->count++;
+			return 1;
+		}
 	}
-	argv_array_clear(&argv);
+	return 0;
+}
+
+static int fetch_start_failure(struct child_process *cp,
+			       struct strbuf *err,
+			       void *cb, void *task_cb)
+{
+	struct submodule_parallel_fetch *spf = cb;
+
+	spf->result = 1;
+
+	return 0;
+}
+
+static int fetch_finish(int retvalue, struct child_process *cp,
+			struct strbuf *err, void *cb, void *task_cb)
+{
+	struct submodule_parallel_fetch *spf = cb;
+
+	if (retvalue)
+		spf->result = 1;
+
+	return 0;
+}
+
+int fetch_populated_submodules(const struct argv_array *options,
+			       const char *prefix, int command_line_option,
+			       int quiet)
+{
+	int i;
+	int max_parallel_jobs = 1;
+	struct submodule_parallel_fetch spf = SPF_INIT;
+
+	spf.work_tree = get_git_work_tree();
+	spf.command_line_option = command_line_option;
+	spf.quiet = quiet;
+	spf.prefix = prefix;
+
+	if (!spf.work_tree)
+		goto out;
+
+	if (read_cache() < 0)
+		die("index file corrupt");
+
+	argv_array_push(&spf.args, "fetch");
+	for (i = 0; i < options->argc; i++)
+		argv_array_push(&spf.args, options->argv[i]);
+	argv_array_push(&spf.args, "--recurse-submodules-default");
+	/* default value, "--submodule-prefix" and its value are added later */
+
+	calculate_changed_submodule_paths();
+	run_processes_parallel(max_parallel_jobs,
+			       get_next_submodule,
+			       fetch_start_failure,
+			       fetch_finish,
+			       &spf);
+
+	argv_array_clear(&spf.args);
 out:
 	string_list_clear(&changed_submodule_paths, 1);
-	return result;
+	return spf.result;
 }
 
 unsigned is_submodule_modified(const char *path, int ignore_untracked)
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 7/7] submodules: allow parallel fetching, add tests and documentation
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (5 preceding siblings ...)
  2015-12-16  0:04 ` [PATCHv2 6/7] fetch_populated_submodules: use new parallel job processing Stefan Beller
@ 2015-12-16  0:04 ` Stefan Beller
  2015-12-16  0:19 ` [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
  2015-12-16 20:20 ` Junio C Hamano
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:04 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

This enables the work of the previous patches.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/fetch-options.txt |  7 +++++++
 builtin/fetch.c                 |  6 +++++-
 builtin/pull.c                  |  6 ++++++
 submodule.c                     |  3 +--
 submodule.h                     |  2 +-
 t/t5526-fetch-submodules.sh     | 20 ++++++++++++++++++++
 6 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 45583d8..6b109f6 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -100,6 +100,13 @@ ifndef::git-pull[]
 	reference to a commit that isn't already in the local submodule
 	clone.
 
+-j::
+--jobs=<n>::
+	Number of parallel children to be used for fetching submodules.
+	Each will fetch from different submodules, such that fetching many
+	submodules will be faster. By default submodules will be fetched
+	one at a time.
+
 --no-recurse-submodules::
 	Disable recursive fetching of submodules (this has the same effect as
 	using the '--recurse-submodules=no' option).
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c85f347..586840d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,6 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
+static int max_children = 1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
@@ -99,6 +100,8 @@ static struct option builtin_fetch_options[] = {
 		    N_("fetch all tags and associated objects"), TAGS_SET),
 	OPT_SET_INT('n', NULL, &tags,
 		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
+	OPT_INTEGER('j', "jobs", &max_children,
+		    N_("number of submodules fetched in parallel")),
 	OPT_BOOL('p', "prune", &prune,
 		 N_("prune remote-tracking branches no longer on remote")),
 	{ OPTION_CALLBACK, 0, "recurse-submodules", NULL, N_("on-demand"),
@@ -1213,7 +1216,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_populated_submodules(&options,
 						    submodule_prefix,
 						    recurse_submodules,
-						    verbosity < 0);
+						    verbosity < 0,
+						    max_children);
 		argv_array_clear(&options);
 	}
 
diff --git a/builtin/pull.c b/builtin/pull.c
index 5145fc6..9e3c738 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -95,6 +95,7 @@ static int opt_force;
 static char *opt_tags;
 static char *opt_prune;
 static char *opt_recurse_submodules;
+static char *max_children;
 static int opt_dry_run;
 static char *opt_keep;
 static char *opt_depth;
@@ -178,6 +179,9 @@ static struct option pull_options[] = {
 		N_("on-demand"),
 		N_("control recursive fetching of submodules"),
 		PARSE_OPT_OPTARG),
+	OPT_PASSTHRU('j', "jobs", &max_children, N_("n"),
+		N_("number of submodules pulled in parallel"),
+		PARSE_OPT_OPTARG),
 	OPT_BOOL(0, "dry-run", &opt_dry_run,
 		N_("dry run")),
 	OPT_PASSTHRU('k', "keep", &opt_keep, NULL,
@@ -525,6 +529,8 @@ static int run_fetch(const char *repo, const char **refspecs)
 		argv_array_push(&args, opt_prune);
 	if (opt_recurse_submodules)
 		argv_array_push(&args, opt_recurse_submodules);
+	if (max_children)
+		argv_array_push(&args, max_children);
 	if (opt_dry_run)
 		argv_array_push(&args, "--dry-run");
 	if (opt_keep)
diff --git a/submodule.c b/submodule.c
index 6a2d786..0b48734 100644
--- a/submodule.c
+++ b/submodule.c
@@ -729,10 +729,9 @@ static int fetch_finish(int retvalue, struct child_process *cp,
 
 int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
-			       int quiet)
+			       int quiet, int max_parallel_jobs)
 {
 	int i;
-	int max_parallel_jobs = 1;
 	struct submodule_parallel_fetch spf = SPF_INIT;
 
 	spf.work_tree = get_git_work_tree();
diff --git a/submodule.h b/submodule.h
index 5507c3d..cbc0003 100644
--- a/submodule.h
+++ b/submodule.h
@@ -31,7 +31,7 @@ void set_config_fetch_recurse_submodules(int value);
 void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
-			       int quiet);
+			       int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 17759b14..1241146 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -71,6 +71,17 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
+	add_upstream_commit &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/../trace.out git fetch --recurse-submodules -j2 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	test_i18ncmp expect.err actual.err &&
+	grep "2 tasks" trace.out
+'
+
 test_expect_success "fetch alone only fetches superproject" '
 	add_upstream_commit &&
 	(
@@ -140,6 +151,15 @@ test_expect_success "--quiet propagates to submodules" '
 	! test -s actual.err
 '
 
+test_expect_success "--quiet propagates to parallel submodules" '
+	(
+		cd downstream &&
+		git fetch --recurse-submodules -j 2 --quiet  >../actual.out 2>../actual.err
+	) &&
+	! test -s actual.out &&
+	! test -s actual.err
+'
+
 test_expect_success "--dry-run propagates to submodules" '
 	add_upstream_commit &&
 	(
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (6 preceding siblings ...)
  2015-12-16  0:04 ` [PATCHv2 7/7] submodules: allow parallel fetching, add tests and documentation Stefan Beller
@ 2015-12-16  0:19 ` Stefan Beller
  2015-12-16 20:20 ` Junio C Hamano
  8 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2015-12-16  0:19 UTC (permalink / raw)
  To: git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine,
	j6t, Stefan Beller

with the interdiff below:

diff --git a/git-compat-util.h b/git-compat-util.h
index 87456a3..8e39867 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -723,7 +723,6 @@ extern void *xmmap(void *start, size_t length, int prot, int flags, int fd, off_
 extern void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_t offset);
 extern int xopen(const char *path, int flags, ...);
 extern ssize_t xread(int fd, void *buf, size_t len);
-extern ssize_t xread_nonblock(int fd, void *buf, size_t len);
 extern ssize_t xwrite(int fd, const void *buf, size_t len);
 extern ssize_t xpread(int fd, void *buf, size_t len, off_t offset);
 extern int xdup(int fd);
diff --git a/strbuf.c b/strbuf.c
index b552a13..38686ff 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -389,7 +389,7 @@ ssize_t strbuf_read_once(struct strbuf *sb, int fd, size_t hint)
 	ssize_t cnt;

 	strbuf_grow(sb, hint ? hint : 8192);
-	cnt = xread_nonblock(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
+	cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
 	if (cnt > 0)
 		strbuf_setlen(sb, sb->len + cnt);
 	return cnt;
diff --git a/strbuf.h b/strbuf.h
index c3e5980..2bf90e7 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,10 +367,10 @@ extern size_t strbuf_fread(struct strbuf *, size_t, FILE *);
 extern ssize_t strbuf_read(struct strbuf *, int fd, size_t hint);

 /**
- * Returns the number of new bytes appended to the sb.
- * Negative return value signals there was an error returned from
- * underlying read(2), in which case the caller should check errno.
- * e.g. errno == EAGAIN when the read may have blocked.
+ * Read the contents of a given file descriptor partially by using only one
+ * attempt of xread. The third argument can be used to give a hint about the
+ * file size, to avoid reallocs. Returns the number of new bytes appended to
+ * the sb.
  */
 extern ssize_t strbuf_read_once(struct strbuf *, int fd, size_t hint);
diff --git a/wrapper.c b/wrapper.c
index f71237c..1770efa 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -243,7 +243,14 @@ ssize_t xread(int fd, void *buf, size_t len)
 				struct pollfd pfd;
 				pfd.events = POLLIN;
 				pfd.fd = fd;
-				/* We deliberately ignore the return value */
+				/*
+				 * it is OK if this poll() failed; we
+				 * want to leave this infinite loop
+				 * only when read() returns with
+				 * success, or an expected failure,
+				 * which would be checked by the next
+				 * call to read(2).
+				 */
 				poll(&pfd, 1, -1);
 			}
 		}
@@ -252,28 +259,6 @@ ssize_t xread(int fd, void *buf, size_t len)
 }

 /*
- * xread_nonblock() is the same a read(), but it automatically restarts read()
- * interrupted operations (EINTR). xread_nonblock() DOES NOT GUARANTEE that
- * "len" bytes is read. EWOULDBLOCK is turned into EAGAIN.
- */
-ssize_t xread_nonblock(int fd, void *buf, size_t len)
-{
-	ssize_t nr;
-	if (len > MAX_IO_SIZE)
-		len = MAX_IO_SIZE;
-	while (1) {
-		nr = read(fd, buf, len);
-		if (nr < 0) {
-			if (errno == EINTR)
-				continue;
-			if (errno == EWOULDBLOCK)
-				errno = EAGAIN;
-		}
-		return nr;
-	}
-}
-
-/*
  * xwrite() is the same a write(), but it automatically restarts write()
  * operations with a recoverable error (EAGAIN and EINTR). xwrite() DOES NOT
  * GUARANTEE that "len" bytes is written even if the operation is successful.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7
  2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (7 preceding siblings ...)
  2015-12-16  0:19 ` [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
@ 2015-12-16 20:20 ` Junio C Hamano
  8 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-12-16 20:20 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, peff, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

Stefan Beller <sbeller@google.com> writes:

> I am sending out a new version for replacing sb/submodule-parallel-fetch for
> the time after the 2.7 release.
>
> * Dropped the patch, which introduces xread_nonblock
> * strbuf_read_once uses xread now. This is safe as we poll before using
>   strbuf_read_once, so we know we won't stall.

"That is only true for the current callers" was my first reaction,
but it is safe and sensible even for future callers.  If they have
something better to do than getting stuck before reading some, they
ought to be checking for POLLIN before calling this, and if not,
they do not have to be forced to spin for themselves until this
returns some new payload.

I like that.

> * have the commit message reworded for "run-command: add an
>   asynchronous parallel child processor" with Johannes' suggestion.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-16  0:04 ` [PATCHv2 2/7] xread: poll on non blocking fds Stefan Beller
@ 2015-12-17 20:12   ` Torsten Bögershausen
  2015-12-17 20:22     ` Stefan Beller
  2015-12-18  3:07   ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Torsten Bögershausen @ 2015-12-17 20:12 UTC (permalink / raw)
  To: Stefan Beller, git, gitster
  Cc: peff, jrnieder, johannes.schindelin, Jens.Lehmann, ericsunshine, j6t

On 16.12.15 01:04, Stefan Beller wrote:
> The man page of read(2) says:
> 
>   EAGAIN The file descriptor fd refers to a file other than a socket
> 	 and has been marked nonblocking (O_NONBLOCK), and the read
> 	 would block.
> 
>   EAGAIN or EWOULDBLOCK
> 	 The file descriptor fd refers to a socket and has been marked
> 	 nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
> 	 allows either error to be returned for this case, and does not
> 	 require these constants to have the same value, so a portable
> 	 application should check for both possibilities.
> 
> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
> As the intent of xread is to read as much as possible either until the
> fd is EOF or an actual error occurs, we can ease the feeder of the fd
> by not spinning the whole time, but rather wait for it politely by not
> busy waiting.
> 
> We should not care if the call to poll failed, as we're in an infinite
> loop and can only get out with the correct read().
I'm not sure if this is valid under all circumstances:
This is what "man poll" says under Linux: 
[]
 ENOMEM There was no space to allocate file descriptor tables.
[]
And this is from Mac OS, ("BSD System Calls Manual")
ERRORS
     Poll() will fail if:

     [EAGAIN]           Allocation of internal data structures fails.  A sub-
                        sequent request may succeed.
And this is opengroup:
http://pubs.opengroup.org/onlinepubs/9699919799//functions/poll.html:
[EAGAIN]
    The allocation of internal data structures failed but a subsequent request may succeed.

read() may return EAGAIN, but poll() may fail to allocate memory, and fail.
Is it always guaranteed that the loop is terminated?



> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Acked-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  wrapper.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/wrapper.c b/wrapper.c
> index 6fcaa4d..1770efa 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -236,8 +236,24 @@ ssize_t xread(int fd, void *buf, size_t len)
>  	    len = MAX_IO_SIZE;
>  	while (1) {
>  		nr = read(fd, buf, len);
> -		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
> -			continue;
> +		if (nr < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +				struct pollfd pfd;
> +				pfd.events = POLLIN;
> +				pfd.fd = fd;
> +				/*
> +				 * it is OK if this poll() failed; we
> +				 * want to leave this infinite loop
> +				 * only when read() returns with
> +				 * success, or an expected failure,
> +				 * which would be checked by the next
> +				 * call to read(2).
> +				 */
> +				poll(&pfd, 1, -1);
> +			}
> +		}
>  		return nr;
>  	}
>  }
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-17 20:12   ` Torsten Bögershausen
@ 2015-12-17 20:22     ` Stefan Beller
  2015-12-17 20:42       ` Torsten Bögershausen
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2015-12-17 20:22 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: git, Junio C Hamano, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On Thu, Dec 17, 2015 at 12:12 PM, Torsten Bögershausen <tboegi@web.de> wrote:
> On 16.12.15 01:04, Stefan Beller wrote:
>> The man page of read(2) says:
>>
>>   EAGAIN The file descriptor fd refers to a file other than a socket
>>        and has been marked nonblocking (O_NONBLOCK), and the read
>>        would block.
>>
>>   EAGAIN or EWOULDBLOCK
>>        The file descriptor fd refers to a socket and has been marked
>>        nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
>>        allows either error to be returned for this case, and does not
>>        require these constants to have the same value, so a portable
>>        application should check for both possibilities.
>>
>> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
>> As the intent of xread is to read as much as possible either until the
>> fd is EOF or an actual error occurs, we can ease the feeder of the fd
>> by not spinning the whole time, but rather wait for it politely by not
>> busy waiting.
>>
>> We should not care if the call to poll failed, as we're in an infinite
>> loop and can only get out with the correct read().
> I'm not sure if this is valid under all circumstances:
> This is what "man poll" says under Linux:
> []
>  ENOMEM There was no space to allocate file descriptor tables.
> []
> And this is from Mac OS, ("BSD System Calls Manual")
> ERRORS
>      Poll() will fail if:
>
>      [EAGAIN]           Allocation of internal data structures fails.  A sub-
>                         sequent request may succeed.
> And this is opengroup:
> http://pubs.opengroup.org/onlinepubs/9699919799//functions/poll.html:
> [EAGAIN]
>     The allocation of internal data structures failed but a subsequent request may succeed.
>
> read() may return EAGAIN, but poll() may fail to allocate memory, and fail.
> Is it always guaranteed that the loop is terminated?

In case poll fails (assume a no op for it), the logic should not have
changed by this patch?

Looking closely:

>>       while (1) {
>>               nr = read(fd, buf, len);
>> -             if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
>> -                     continue;
>> +             if (nr < 0) {
>> +                     if (errno == EINTR)
>> +                             continue;
>> +                     if (errno == EAGAIN || errno == EWOULDBLOCK) {
>> +                             struct pollfd pfd;
>> +                             pfd.events = POLLIN;
>> +                             pfd.fd = fd;
>> +                             /*
>> +                              * it is OK if this poll() failed; we
>> +                              * want to leave this infinite loop
>> +                              * only when read() returns with
>> +                              * success, or an expected failure,
>> +                              * which would be checked by the next
>> +                              * call to read(2).
>> +                              */
>> +                             poll(&pfd, 1, -1);

Or do you mean to insert another continue in here?

>> +                     }
>> +             }
>>               return nr;
>>       }
>>  }
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-17 20:22     ` Stefan Beller
@ 2015-12-17 20:42       ` Torsten Bögershausen
  2015-12-17 20:51         ` Stefan Beller
  2015-12-18  3:13         ` Jeff King
  0 siblings, 2 replies; 19+ messages in thread
From: Torsten Bögershausen @ 2015-12-17 20:42 UTC (permalink / raw)
  To: Stefan Beller, Torsten Bögershausen
  Cc: git, Junio C Hamano, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On 17.12.15 21:22, Stefan Beller wrote:
> On Thu, Dec 17, 2015 at 12:12 PM, Torsten Bögershausen <tboegi@web.de> wrote:
>> On 16.12.15 01:04, Stefan Beller wrote:
>>> The man page of read(2) says:
>>>
>>>   EAGAIN The file descriptor fd refers to a file other than a socket
>>>        and has been marked nonblocking (O_NONBLOCK), and the read
>>>        would block.
>>>
>>>   EAGAIN or EWOULDBLOCK
>>>        The file descriptor fd refers to a socket and has been marked
>>>        nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
>>>        allows either error to be returned for this case, and does not
>>>        require these constants to have the same value, so a portable
>>>        application should check for both possibilities.
>>>
>>> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
>>> As the intent of xread is to read as much as possible either until the
>>> fd is EOF or an actual error occurs, we can ease the feeder of the fd
>>> by not spinning the whole time, but rather wait for it politely by not
>>> busy waiting.
>>>
>>> We should not care if the call to poll failed, as we're in an infinite
>>> loop and can only get out with the correct read().
>> I'm not sure if this is valid under all circumstances:
>> This is what "man poll" says under Linux:
>> []
>>  ENOMEM There was no space to allocate file descriptor tables.
>> []
>> And this is from Mac OS, ("BSD System Calls Manual")
>> ERRORS
>>      Poll() will fail if:
>>
>>      [EAGAIN]           Allocation of internal data structures fails.  A sub-
>>                         sequent request may succeed.
>> And this is opengroup:
>> http://pubs.opengroup.org/onlinepubs/9699919799//functions/poll.html:
>> [EAGAIN]
>>     The allocation of internal data structures failed but a subsequent request may succeed.
>>
>> read() may return EAGAIN, but poll() may fail to allocate memory, and fail.
>> Is it always guaranteed that the loop is terminated?
> 
> In case poll fails (assume a no op for it), the logic should not have
> changed by this patch?
> 
> Looking closely:
> 
>>>       while (1) {
>>>               nr = read(fd, buf, len);
>>> -             if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
>>> -                     continue;
>>> +             if (nr < 0) {
>>> +                     if (errno == EINTR)
>>> +                             continue;
>>> +                     if (errno == EAGAIN || errno == EWOULDBLOCK) {
>>> +                             struct pollfd pfd;
>>> +                             pfd.events = POLLIN;
>>> +                             pfd.fd = fd;
>>> +                             /*
>>> +                              * it is OK if this poll() failed; we
>>> +                              * want to leave this infinite loop
>>> +                              * only when read() returns with
>>> +                              * success, or an expected failure,
>>> +                              * which would be checked by the next
>>> +                              * call to read(2).
>>> +                              */
>>> +                             poll(&pfd, 1, -1);
> 
> Or do you mean to insert another continue in here?
I was thinking that we run into similar loop as before:
read() returns -1; errno = EAGAIN  /* No data to read */
poll() returns -1; errno = EAGAIN /* poll failed. If the fd was OK, the failure may be temporaly,
                                    as much as poll() can see.
                                    But most probably we run out ouf memory */

So the code would look like this:

   if (!poll(&pfd, 1, -1))
      return -1;


> 

>>> +                     }
>>> +             }
>>>               return nr;
>>>       }
>>>  }
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-17 20:42       ` Torsten Bögershausen
@ 2015-12-17 20:51         ` Stefan Beller
  2015-12-18  3:14           ` Jeff King
  2015-12-18  3:13         ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2015-12-17 20:51 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: git, Junio C Hamano, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On Thu, Dec 17, 2015 at 12:42 PM, Torsten Bögershausen <tboegi@web.de> wrote:
> On 17.12.15 21:22, Stefan Beller wrote:
>> On Thu, Dec 17, 2015 at 12:12 PM, Torsten Bögershausen <tboegi@web.de> wrote:
>>> On 16.12.15 01:04, Stefan Beller wrote:
>>>> The man page of read(2) says:
>>>>
>>>>   EAGAIN The file descriptor fd refers to a file other than a socket
>>>>        and has been marked nonblocking (O_NONBLOCK), and the read
>>>>        would block.
>>>>
>>>>   EAGAIN or EWOULDBLOCK
>>>>        The file descriptor fd refers to a socket and has been marked
>>>>        nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
>>>>        allows either error to be returned for this case, and does not
>>>>        require these constants to have the same value, so a portable
>>>>        application should check for both possibilities.
>>>>
>>>> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
>>>> As the intent of xread is to read as much as possible either until the
>>>> fd is EOF or an actual error occurs, we can ease the feeder of the fd
>>>> by not spinning the whole time, but rather wait for it politely by not
>>>> busy waiting.
>>>>
>>>> We should not care if the call to poll failed, as we're in an infinite
>>>> loop and can only get out with the correct read().
>>> I'm not sure if this is valid under all circumstances:
>>> This is what "man poll" says under Linux:
>>> []
>>>  ENOMEM There was no space to allocate file descriptor tables.
>>> []
>>> And this is from Mac OS, ("BSD System Calls Manual")
>>> ERRORS
>>>      Poll() will fail if:
>>>
>>>      [EAGAIN]           Allocation of internal data structures fails.  A sub-
>>>                         sequent request may succeed.
>>> And this is opengroup:
>>> http://pubs.opengroup.org/onlinepubs/9699919799//functions/poll.html:
>>> [EAGAIN]
>>>     The allocation of internal data structures failed but a subsequent request may succeed.
>>>
>>> read() may return EAGAIN, but poll() may fail to allocate memory, and fail.
>>> Is it always guaranteed that the loop is terminated?
>>
>> In case poll fails (assume a no op for it), the logic should not have
>> changed by this patch?
>>
>> Looking closely:
>>
>>>>       while (1) {
>>>>               nr = read(fd, buf, len);
>>>> -             if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
>>>> -                     continue;
>>>> +             if (nr < 0) {
>>>> +                     if (errno == EINTR)
>>>> +                             continue;
>>>> +                     if (errno == EAGAIN || errno == EWOULDBLOCK) {
>>>> +                             struct pollfd pfd;
>>>> +                             pfd.events = POLLIN;
>>>> +                             pfd.fd = fd;
>>>> +                             /*
>>>> +                              * it is OK if this poll() failed; we
>>>> +                              * want to leave this infinite loop
>>>> +                              * only when read() returns with
>>>> +                              * success, or an expected failure,
>>>> +                              * which would be checked by the next
>>>> +                              * call to read(2).
>>>> +                              */
>>>> +                             poll(&pfd, 1, -1);
>>
>> Or do you mean to insert another continue in here?
> I was thinking that we run into similar loop as before:
> read() returns -1; errno = EAGAIN  /* No data to read */

which is expected for non blocking fds,

> poll() returns -1; errno = EAGAIN /* poll failed. If the fd was OK, the failure may be temporaly,
>                                     as much as poll() can see.
>                                     But most probably we run out ouf memory */

Before this patch we would not have asked poll, but had just a continue here,
so I think we need to have it here again no matter of the return code
of the poll.

If poll determines it is low on memory, this should not make this function fail,
we can still do as good as we did before by just asking read
repeatedly again, though?

So I'd be convinced now we'd want to have:

    poll(&pfd, 1, -1); /* this is only buying time
                        for the fd to deliver data, in case it fails
                        we don't care but just fall back to old
                        behavior before this patch with busy spinning*/
    continue;


>
> So the code would look like this:
>
>    if (!poll(&pfd, 1, -1))
>       return -1;
>
>
>>
>
>>>> +                     }
>>>> +             }
>>>>               return nr;
>>>>       }
>>>>  }
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-16  0:04 ` [PATCHv2 2/7] xread: poll on non blocking fds Stefan Beller
  2015-12-17 20:12   ` Torsten Bögershausen
@ 2015-12-18  3:07   ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-12-18  3:07 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

On Tue, Dec 15, 2015 at 04:04:07PM -0800, Stefan Beller wrote:

> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
> As the intent of xread is to read as much as possible either until the
> fd is EOF or an actual error occurs, we can ease the feeder of the fd
> by not spinning the whole time, but rather wait for it politely by not
> busy waiting.

I'm not sure this second sentence is true. The point of xread() is to do
a single read that makes forward progress (or returns a real error), not
to read as much as we can.

The rationale seems more to me like:

  The point of xread() is to perform read() until we either get some
  data, or encounter a "real" error. We do not count EINTR or EAGAIN as
  "real" errors, as a reader trying to make forward progress would
  generally just repeat the read, so we loop for them as a convenience.

  In the case of EINTR, trying the read() again immediately is fine; a
  signal interrupted us, but there is no reason to think the read()
  would not succeed if we tried it again.

  For EAGAIN, however, we know that the fd must have set O_NONBLOCK. In
  this case reading again immediately is not likely to produce results,
  and we will chew up CPU spinning on read() calls. Instead, let's
  insert a poll() to block until we actually get data.

I also don't know that this patch necessarily needs to be part of the
parallel-fetch series anymore. We are not setting O_NONBLOCK ourselves,
and I don't think the correctness of the new strbuf_read_once() is
impacted by this. I.e., I think this optimization applies equally well
to the hundreds of existing xread() calls, should they unexpectedly be
fed an O_NONBLOCK descriptor.

That said, I do not mind it here, but I think it could be its own
separate topic if you wanted to reduce the complexity of the
parallel-fetch topic.

> We should not care if the call to poll failed, as we're in an infinite
> loop and can only get out with the correct read().

You might want to expand on "correct" here. I think you mean "can only
get out with a read() that returns data or a "real" error" (I am reusing
the terminology I introduced above; I am fine with other terminology,
depending on how you word the rest of it).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-17 20:42       ` Torsten Bögershausen
  2015-12-17 20:51         ` Stefan Beller
@ 2015-12-18  3:13         ` Jeff King
  2015-12-18  8:50           ` Torsten Bögershausen
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2015-12-18  3:13 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Stefan Beller, git, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On Thu, Dec 17, 2015 at 09:42:01PM +0100, Torsten Bögershausen wrote:

> > Or do you mean to insert another continue in here?
> I was thinking that we run into similar loop as before:
> read() returns -1; errno = EAGAIN  /* No data to read */
> poll() returns -1; errno = EAGAIN /* poll failed. If the fd was OK, the failure may be temporaly,
>                                     as much as poll() can see.
>                                     But most probably we run out ouf memory */
> 
> So the code would look like this:
> 
>    if (!poll(&pfd, 1, -1))
>       return -1;

That changes the semantics of the function. The poll() is just a
convenience to avoid spinning. If it fails, with Stefan's patch[1] the
worst case is that we would spin on read() and poll(), instead of
actually blocking in the poll().

But if we return on poll() failure, now the caller will see errors from
poll() even though they don't know or care that we called poll() in the
first place. Consider what would happen with your code if read got
EAGAIN and then poll got EINTR. We would report an error, even though
the whole point of xread() is to loop on these conditions.

-Peff

[1] Stefan's patch does have a bug. It forgets to "continue" after
    calling poll, which means we will block with poll and _then_ exit
    with -1, instead of restarting the loop.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-17 20:51         ` Stefan Beller
@ 2015-12-18  3:14           ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-12-18  3:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Torsten Bögershausen, git, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On Thu, Dec 17, 2015 at 12:51:08PM -0800, Stefan Beller wrote:

> > poll() returns -1; errno = EAGAIN /* poll failed. If the fd was OK, the failure may be temporaly,
> >                                     as much as poll() can see.
> >                                     But most probably we run out ouf memory */
> 
> Before this patch we would not have asked poll, but had just a continue here,
> so I think we need to have it here again no matter of the return code
> of the poll.
> 
> If poll determines it is low on memory, this should not make this function fail,
> we can still do as good as we did before by just asking read
> repeatedly again, though?
> 
> So I'd be convinced now we'd want to have:
> 
>     poll(&pfd, 1, -1); /* this is only buying time
>                         for the fd to deliver data, in case it fails
>                         we don't care but just fall back to old
>                         behavior before this patch with busy spinning*/
>     continue;

Right. I think that is the only sensible thing, and your comment
explains perfectly what is going on.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-18  3:13         ` Jeff King
@ 2015-12-18  8:50           ` Torsten Bögershausen
  2015-12-18 18:12             ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Torsten Bögershausen @ 2015-12-18  8:50 UTC (permalink / raw)
  To: Jeff King, Torsten Bögershausen
  Cc: Stefan Beller, git, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On 18.12.15 04:13, Jeff King wrote:
> On Thu, Dec 17, 2015 at 09:42:01PM +0100, Torsten Bögershausen wrote:
> 
>>> Or do you mean to insert another continue in here?
>> I was thinking that we run into similar loop as before:
>> read() returns -1; errno = EAGAIN  /* No data to read */
>> poll() returns -1; errno = EAGAIN /* poll failed. If the fd was OK, the failure may be temporaly,
>>                                     as much as poll() can see.
>>                                     But most probably we run out ouf memory */
>>
>> So the code would look like this:
>>
>>    if (!poll(&pfd, 1, -1))
>>       return -1;
> 
> That changes the semantics of the function. The poll() is just a
> convenience to avoid spinning. If it fails, with Stefan's patch[1] the
> worst case is that we would spin on read() and poll(), instead of
> actually blocking in the poll().
> 
> But if we return on poll() failure, now the caller will see errors from
> poll() even though they don't know or care that we called poll() in the
> first place. Consider what would happen with your code if read got
> EAGAIN and then poll got EINTR. We would report an error, even though
> the whole point of xread() is to loop on these conditions.
> 
> -Peff
> 
> [1] Stefan's patch does have a bug. It forgets to "continue" after
>     calling poll, which means we will block with poll and _then_ exit
>     with -1, instead of restarting the loop.
> --
I love this group: Curing one bug with another doesn't work :-)

/* So the code v2 would look like this: */

    if (!poll(&pfd, 1, -1)) {
        if (errno == EINTR)
            continue;
         return -1; /* poll() failed, this is serious. */
    }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/7] xread: poll on non blocking fds
  2015-12-18  8:50           ` Torsten Bögershausen
@ 2015-12-18 18:12             ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2015-12-18 18:12 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Stefan Beller, git, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine, Johannes Sixt

On Fri, Dec 18, 2015 at 09:50:46AM +0100, Torsten Bögershausen wrote:

> >> So the code would look like this:
> >>
> >>    if (!poll(&pfd, 1, -1))
> >>       return -1;
> > 
> > That changes the semantics of the function. The poll() is just a
> > convenience to avoid spinning. If it fails, with Stefan's patch[1] the
> > worst case is that we would spin on read() and poll(), instead of
> > actually blocking in the poll().
> > 
> > But if we return on poll() failure, now the caller will see errors from
> > poll() even though they don't know or care that we called poll() in the
> > first place. Consider what would happen with your code if read got
> > EAGAIN and then poll got EINTR. We would report an error, even though
> > the whole point of xread() is to loop on these conditions.
> [...]
>
> /* So the code v2 would look like this: */
> 
>     if (!poll(&pfd, 1, -1)) {
>         if (errno == EINTR)
>             continue;
>          return -1; /* poll() failed, this is serious. */
>     }

That solves the EINTR problem, but I still don't see why we want to
return -1. The caller asked us to read(). We know that read() did not
fail with an actual error. Yet we are going to return an error to the
user, with errno set to something related only to poll(). I think we are
better off to keep the same semantics from the caller's point of view:
we loop until read() returns forward progress or a real error, and
anything else we do is a behind-the-scenes optimization.

BTW, I am assuming you mean:

  if (poll(&pfd, 1, -1) < 0)
	...

in your examples. Returning "0" means that poll timed out, but of course
we are not providing a timeout.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-12-18 18:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16  0:04 [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
2015-12-16  0:04 ` [PATCHv2 1/7] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
2015-12-16  0:04 ` [PATCHv2 2/7] xread: poll on non blocking fds Stefan Beller
2015-12-17 20:12   ` Torsten Bögershausen
2015-12-17 20:22     ` Stefan Beller
2015-12-17 20:42       ` Torsten Bögershausen
2015-12-17 20:51         ` Stefan Beller
2015-12-18  3:14           ` Jeff King
2015-12-18  3:13         ` Jeff King
2015-12-18  8:50           ` Torsten Bögershausen
2015-12-18 18:12             ` Jeff King
2015-12-18  3:07   ` Jeff King
2015-12-16  0:04 ` [PATCHv2 3/7] strbuf: add strbuf_read_once to read without blocking Stefan Beller
2015-12-16  0:04 ` [PATCHv2 4/7] sigchain: add command to pop all common signals Stefan Beller
2015-12-16  0:04 ` [PATCHv2 5/7] run-command: add an asynchronous parallel child processor Stefan Beller
2015-12-16  0:04 ` [PATCHv2 6/7] fetch_populated_submodules: use new parallel job processing Stefan Beller
2015-12-16  0:04 ` [PATCHv2 7/7] submodules: allow parallel fetching, add tests and documentation Stefan Beller
2015-12-16  0:19 ` [PATCHv2 0/7] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
2015-12-16 20:20 ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.