git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Parallel Checkout (part 2)
@ 2021-03-17 21:12 Matheus Tavares
  2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
                   ` (7 more replies)
  0 siblings, 8 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-03-17 21:12 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

This is the next step in the parallel checkout implementation. An
overview of the complete series can be seen at [1]. 

The last patch in this series adds a design doc, so it may help to
review it first. Also, there is no need to have any familiarity with
part-1, as this part doesn't have any semantic dependency with that.

This series is based on the merge of 'mt/parallel-checkout-part-1' and
'master', so that it can use the "brew cast" fix and the latest security
fix (both from master), to run the tests. (The merge is textually
clean, but it needs a small semantic fix: the '#include "entry.h"'
addition in builtin/stash.c).

Parallel-checkout-specific tests will be added in part-3.

[1]: https://lore.kernel.org/git/cover.1604521275.git.matheus.bernardino@usp.br/

Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 262 ++++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--helper.c                    | 142 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 624 ++++++++++++++++++
 parallel-checkout.h                           | 111 ++++
 unpack-trees.c                                |  19 +-
 12 files changed, 1198 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--helper.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

-- 
2.30.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/5] unpack-trees: add basic support for parallel checkout
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
@ 2021-03-17 21:12 ` Matheus Tavares
  2021-03-31  4:22   ` Christian Couder
  2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Matheus Tavares @ 2021-03-17 21:12 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

This new interface allows us to enqueue some of the entries being
checked out to later uncompress, smudge, and write them in parallel. For
now, the parallel checkout machinery is enabled by default and there is
no user configuration, but run_parallel_checkout() just writes the
queued entries in sequence (without spawning additional workers). The
next patch will actually implement the parallelism and, later, we will
make it configurable.

Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:

- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
  detects it by looking for EEXIST and EISDIR errors after an
  open(O_CREAT | O_EXCL) failure.

- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
  at the has_dirs_only_path() check, which is done for the leading path
  of each item in the parallel checkout queue.

Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).

After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile            |   1 +
 entry.c             |  17 +-
 parallel-checkout.c | 368 ++++++++++++++++++++++++++++++++++++++++++++
 parallel-checkout.h |  32 ++++
 unpack-trees.c      |   6 +-
 5 files changed, 421 insertions(+), 3 deletions(-)
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

diff --git a/Makefile b/Makefile
index dfb0f1000f..f7b9ab49f9 100644
--- a/Makefile
+++ b/Makefile
@@ -938,6 +938,7 @@ LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pager.o
+LIB_OBJS += parallel-checkout.o
 LIB_OBJS += parse-options-cb.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += patch-delta.o
diff --git a/entry.c b/entry.c
index 2ce16414a7..6a22c45050 100644
--- a/entry.c
+++ b/entry.c
@@ -7,6 +7,7 @@
 #include "progress.h"
 #include "fsmonitor.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-		if (dup == ce)
-			break;
+		if (dup == ce) {
+			/*
+			 * Parallel checkout doesn't create the files in index
+			 * order. So the other side of the collision may appear
+			 * after the given cache_entry in the array.
+			 */
+			if (parallel_checkout_status() == PC_RUNNING)
+				continue;
+			else
+				break;
+		}
 
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
@@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
 		ca = &ca_buf;
 	}
 
+	if (!enqueue_checkout(ce, ca))
+		return 0;
+
 	return write_entry(ce, path.buf, ca, state, 0);
 }
 
diff --git a/parallel-checkout.c b/parallel-checkout.c
new file mode 100644
index 0000000000..80a60eb2d3
--- /dev/null
+++ b/parallel-checkout.c
@@ -0,0 +1,368 @@
+#include "cache.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "streaming.h"
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/* pointer to a istate->cache[] entry. Not owned by us. */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	struct stat st;
+	enum pc_item_status status;
+};
+
+struct parallel_checkout {
+	enum pc_status status;
+	struct parallel_checkout_item *items;
+	size_t nr, alloc;
+};
+
+static struct parallel_checkout parallel_checkout;
+
+enum pc_status parallel_checkout_status(void)
+{
+	return parallel_checkout.status;
+}
+
+void init_parallel_checkout(void)
+{
+	if (parallel_checkout.status != PC_UNINITIALIZED)
+		BUG("parallel checkout already initialized");
+
+	parallel_checkout.status = PC_ACCEPTING_ENTRIES;
+}
+
+static void finish_parallel_checkout(void)
+{
+	if (parallel_checkout.status == PC_UNINITIALIZED)
+		BUG("cannot finish parallel checkout: not initialized yet");
+
+	free(parallel_checkout.items);
+	memset(&parallel_checkout, 0, sizeof(parallel_checkout));
+}
+
+static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
+					     const struct conv_attrs *ca)
+{
+	enum conv_attrs_classification c;
+
+	/*
+	 * Symlinks cannot be checked out in parallel as, in case of path
+	 * collision, they could racily replace leading directories of other
+	 * entries being checked out. Submodules are checked out in child
+	 * processes, which have their own parallel checkout queues.
+	 */
+	if (!S_ISREG(ce->ce_mode))
+		return 0;
+
+	c = classify_conv_attrs(ca);
+	switch (c) {
+	case CA_CLASS_INCORE:
+		return 1;
+
+	case CA_CLASS_INCORE_FILTER:
+		/*
+		 * It would be safe to allow concurrent instances of
+		 * single-file smudge filters, like rot13, but we should not
+		 * assume that all filters are parallel-process safe. So we
+		 * don't allow this.
+		 */
+		return 0;
+
+	case CA_CLASS_INCORE_PROCESS:
+		/*
+		 * The parallel queue and the delayed queue are not compatible,
+		 * so they must be kept completely separated. And we can't tell
+		 * if a long-running process will delay its response without
+		 * actually asking it to perform the filtering. Therefore, this
+		 * type of filter is not allowed in parallel checkout.
+		 *
+		 * Furthermore, there should only be one instance of the
+		 * long-running process filter as we don't know how it is
+		 * managing its own concurrency. So, spreading the entries that
+		 * requisite such a filter among the parallel workers would
+		 * require a lot more inter-process communication. We would
+		 * probably have to designate a single process to interact with
+		 * the filter and send all the necessary data to it, for each
+		 * entry.
+		 */
+		return 0;
+
+	case CA_CLASS_STREAMABLE:
+		return 1;
+
+	default:
+		BUG("unsupported conv_attrs classification '%d'", c);
+	}
+}
+
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
+{
+	struct parallel_checkout_item *pc_item;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES ||
+	    !is_eligible_for_parallel_checkout(ce, ca))
+		return -1;
+
+	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
+		   parallel_checkout.alloc);
+
+	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item->ce = ce;
+	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
+	pc_item->status = PC_ITEM_PENDING;
+
+	return 0;
+}
+
+static int handle_results(struct checkout *state)
+{
+	int ret = 0;
+	size_t i;
+	int have_pending = 0;
+
+	/*
+	 * We first update the successfully written entries with the collected
+	 * stat() data, so that they can be found by mark_colliding_entries(),
+	 * in the next loop, when necessary.
+	 */
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		if (pc_item->status == PC_ITEM_WRITTEN)
+			update_ce_after_write(state, pc_item->ce, &pc_item->st);
+	}
+
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+
+		switch(pc_item->status) {
+		case PC_ITEM_WRITTEN:
+			/* Already handled */
+			break;
+		case PC_ITEM_COLLIDED:
+			/*
+			 * The entry could not be checked out due to a path
+			 * collision with another entry. Since there can only
+			 * be one entry of each colliding group on the disk, we
+			 * could skip trying to check out this one and move on.
+			 * However, this would leave the unwritten entries with
+			 * null stat() fields on the index, which could
+			 * potentially slow down subsequent operations that
+			 * require refreshing it: git would not be able to
+			 * trust st_size and would have to go to the filesystem
+			 * to see if the contents match (see ie_modified()).
+			 *
+			 * Instead, let's pay the overhead only once, now, and
+			 * call checkout_entry_ca() again for this file, to
+			 * have it's stat() data stored in the index. This also
+			 * has the benefit of adding this entry and its
+			 * colliding pair to the collision report message.
+			 * Additionally, this overwriting behavior is consistent
+			 * with what the sequential checkout does, so it doesn't
+			 * add any extra overhead.
+			 */
+			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
+						 state, NULL, NULL);
+			break;
+		case PC_ITEM_PENDING:
+			have_pending = 1;
+			/* fall through */
+		case PC_ITEM_FAILED:
+			ret = -1;
+			break;
+		default:
+			BUG("unknown checkout item status in parallel checkout");
+		}
+	}
+
+	if (have_pending)
+		error(_("parallel checkout finished with pending entries"));
+
+	return ret;
+}
+
+static int reset_fd(int fd, const char *path)
+{
+	if (lseek(fd, 0, SEEK_SET) != 0)
+		return error_errno("failed to rewind descriptor of %s", path);
+	if (ftruncate(fd, 0))
+		return error_errno("failed to truncate file %s", path);
+	return 0;
+}
+
+static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
+			       const char *path)
+{
+	int ret;
+	struct stream_filter *filter;
+	struct strbuf buf = STRBUF_INIT;
+	char *new_blob;
+	unsigned long size;
+	size_t newsize = 0;
+	ssize_t wrote;
+
+	/* Sanity check */
+	assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
+
+	filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
+	if (filter) {
+		if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
+			/* On error, reset fd to try writing without streaming */
+			if (reset_fd(fd, path))
+				return -1;
+		} else {
+			return 0;
+		}
+	}
+
+	new_blob = read_blob_entry(pc_item->ce, &size);
+	if (!new_blob)
+		return error("unable to read sha1 file of %s (%s)", path,
+			     oid_to_hex(&pc_item->ce->oid));
+
+	/*
+	 * checkout metadata is used to give context for external process
+	 * filters. Files requiring such filters are not eligible for parallel
+	 * checkout, so pass NULL.
+	 */
+	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
+					 new_blob, size, &buf, NULL);
+
+	if (ret) {
+		free(new_blob);
+		new_blob = strbuf_detach(&buf, &newsize);
+		size = newsize;
+	}
+
+	wrote = write_in_full(fd, new_blob, size);
+	free(new_blob);
+	if (wrote < 0)
+		return error("unable to write file %s", path);
+
+	return 0;
+}
+
+static int close_and_clear(int *fd)
+{
+	int ret = 0;
+
+	if (*fd >= 0) {
+		ret = close(*fd);
+		*fd = -1;
+	}
+
+	return ret;
+}
+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+			  struct checkout *state)
+{
+	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+	int fd = -1, fstat_done = 0;
+	struct strbuf path = STRBUF_INIT;
+	const char *dir_sep;
+
+	strbuf_add(&path, state->base_dir, state->base_dir_len);
+	strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
+	dir_sep = find_last_dir_sep(path.buf);
+
+	/*
+	 * The leading dirs should have been already created by now. But, in
+	 * case of path collisions, one of the dirs could have been replaced by
+	 * a symlink (checked out after we enqueued this entry for parallel
+	 * checkout). Thus, we must check the leading dirs again.
+	 */
+	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
+					   state->base_dir_len)) {
+		pc_item->status = PC_ITEM_COLLIDED;
+		goto out;
+	}
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode);
+
+	if (fd < 0) {
+		if (errno == EEXIST || errno == EISDIR) {
+			/*
+			 * Errors which probably represent a path collision.
+			 * Suppress the error message and mark the item to be
+			 * retried later, sequentially. ENOTDIR and ENOENT are
+			 * also interesting, but the above has_dirs_only_path()
+			 * call should have already caught these cases.
+			 */
+			pc_item->status = PC_ITEM_COLLIDED;
+		} else {
+			error_errno("failed to open file %s", path.buf);
+			pc_item->status = PC_ITEM_FAILED;
+		}
+		goto out;
+	}
+
+	if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
+		/* Error was already reported. */
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
+
+	if (close_and_clear(&fd)) {
+		error_errno("unable to close file %s", path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
+		error_errno("unable to stat just-written file %s",  path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	pc_item->status = PC_ITEM_WRITTEN;
+
+out:
+	/*
+	 * No need to check close() return at this point. Either fd is already
+	 * closed, or we are on an error path.
+	 */
+	close_and_clear(&fd);
+	strbuf_release(&path);
+}
+
+static void write_items_sequentially(struct checkout *state)
+{
+	size_t i;
+
+	for (i = 0; i < parallel_checkout.nr; i++)
+		write_pc_item(&parallel_checkout.items[i], state);
+}
+
+int run_parallel_checkout(struct checkout *state)
+{
+	int ret;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
+		BUG("cannot run parallel checkout: uninitialized or already running");
+
+	parallel_checkout.status = PC_RUNNING;
+
+	write_items_sequentially(state);
+	ret = handle_results(state);
+
+	finish_parallel_checkout();
+	return ret;
+}
diff --git a/parallel-checkout.h b/parallel-checkout.h
new file mode 100644
index 0000000000..4ad2a519b3
--- /dev/null
+++ b/parallel-checkout.h
@@ -0,0 +1,32 @@
+#ifndef PARALLEL_CHECKOUT_H
+#define PARALLEL_CHECKOUT_H
+
+struct cache_entry;
+struct checkout;
+struct conv_attrs;
+
+enum pc_status {
+	PC_UNINITIALIZED = 0,
+	PC_ACCEPTING_ENTRIES,
+	PC_RUNNING,
+};
+
+enum pc_status parallel_checkout_status(void);
+
+/*
+ * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
+ * only when in the PC_UNINITIALIZED state.
+ */
+void init_parallel_checkout(void);
+
+/*
+ * Return -1 if parallel checkout is currently not accepting entries or if the
+ * entry is not eligible for parallel checkout. Otherwise, enqueue the entry
+ * for later write and return 0.
+ */
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+
+/* Write all the queued entries, returning 0 on success.*/
+int run_parallel_checkout(struct checkout *state);
+
+#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 5b3dd38f8c..b9548de96a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -441,7 +442,6 @@ static int check_updates(struct unpack_trees_options *o,
 	if (should_update_submodules())
 		load_gitmodules_file(index, &state);
 
-	enable_delayed_checkout(&state);
 	if (has_promisor_remote()) {
 		/*
 		 * Prefetch the objects that are to be checked out in the loop
@@ -464,6 +464,9 @@ static int check_updates(struct unpack_trees_options *o,
 					   to_fetch.oid, to_fetch.nr);
 		oid_array_clear(&to_fetch);
 	}
+
+	enable_delayed_checkout(&state);
+	init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -477,6 +480,7 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
+	errs |= run_parallel_checkout(&state);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/5] parallel-checkout: make it truly parallel
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
  2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2021-03-17 21:12 ` Matheus Tavares
  2021-03-31  4:32   ` Christian Couder
  2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Matheus Tavares @ 2021-03-17 21:12 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Use multiple worker processes to distribute the queued entries and call
write_pc_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could affect
performance due to lock contention in the kernel. Work stealing (or any
other format of re-distribution) is not implemented yet.

The protocol between the main process and the workers is quite simple.
They exchange binary messages packed in pkt-line format, and use
PKT-FLUSH to mark the end of input (from both sides). The main process
starts the communication by sending N pkt-lines, each corresponding to
an item that needs to be written. These packets contain all the
necessary information to load, smudge, and write the blob associated
with each item. Then it waits for the worker to send back N pkt-lines
containing the results for each item. The resulting packet must contain:
the identification number of the item that it refers to, the status of
the operation, and the lstat() data gathered after writing the file (iff
the operation was successful).

For now, checkout always uses a hardcoded value of 2 workers, only to
demonstrate that the parallel checkout framework correctly divides and
writes the queued entries. The next patch will add user configurations
and define a more reasonable default, based on tests with the said
settings.

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 .gitignore                 |   1 +
 Makefile                   |   1 +
 builtin.h                  |   1 +
 builtin/checkout--helper.c | 142 ++++++++++++++++++++
 git.c                      |   2 +
 parallel-checkout.c        | 267 +++++++++++++++++++++++++++++++++----
 parallel-checkout.h        |  73 +++++++++-
 7 files changed, 460 insertions(+), 27 deletions(-)
 create mode 100644 builtin/checkout--helper.c

diff --git a/.gitignore b/.gitignore
index 3dcdb6bb5a..26f8ddfc55 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@
 /git-check-mailmap
 /git-check-ref-format
 /git-checkout
+/git-checkout--helper
 /git-checkout-index
 /git-cherry
 /git-cherry-pick
diff --git a/Makefile b/Makefile
index f7b9ab49f9..29df386d8a 100644
--- a/Makefile
+++ b/Makefile
@@ -1054,6 +1054,7 @@ BUILTIN_OBJS += builtin/check-attr.o
 BUILTIN_OBJS += builtin/check-ignore.o
 BUILTIN_OBJS += builtin/check-mailmap.o
 BUILTIN_OBJS += builtin/check-ref-format.o
+BUILTIN_OBJS += builtin/checkout--helper.o
 BUILTIN_OBJS += builtin/checkout-index.o
 BUILTIN_OBJS += builtin/checkout.o
 BUILTIN_OBJS += builtin/clean.o
diff --git a/builtin.h b/builtin.h
index b6ce981b73..a3a3f7abb1 100644
--- a/builtin.h
+++ b/builtin.h
@@ -123,6 +123,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix);
 int cmd_bundle(int argc, const char **argv, const char *prefix);
 int cmd_cat_file(int argc, const char **argv, const char *prefix);
 int cmd_checkout(int argc, const char **argv, const char *prefix);
+int cmd_checkout__helper(int argc, const char **argv, const char *prefix);
 int cmd_checkout_index(int argc, const char **argv, const char *prefix);
 int cmd_check_attr(int argc, const char **argv, const char *prefix);
 int cmd_check_ignore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c
new file mode 100644
index 0000000000..df9a6fab1e
--- /dev/null
+++ b/builtin/checkout--helper.c
@@ -0,0 +1,142 @@
+#include "builtin.h"
+#include "config.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static void packet_to_pc_item(char *line, int len,
+			      struct parallel_checkout_item *pc_item)
+{
+	struct pc_item_fixed_portion *fixed_portion;
+	char *encoding, *variant;
+
+	if (len < sizeof(struct pc_item_fixed_portion))
+		BUG("checkout worker received too short item (got %dB, exp %dB)",
+		    len, (int)sizeof(struct pc_item_fixed_portion));
+
+	fixed_portion = (struct pc_item_fixed_portion *)line;
+
+	if (len - sizeof(struct pc_item_fixed_portion) !=
+		fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
+		BUG("checkout worker received corrupted item");
+
+	variant = line + sizeof(struct pc_item_fixed_portion);
+
+	/*
+	 * Note: the main process uses zero length to communicate that the
+	 * encoding is NULL. There is no use case that requires sending an
+	 * actual empty string, since convert_attrs() never sets
+	 * ca.working_tree_enconding to "".
+	 */
+	if (fixed_portion->working_tree_encoding_len) {
+		encoding = xmemdupz(variant,
+				    fixed_portion->working_tree_encoding_len);
+		variant += fixed_portion->working_tree_encoding_len;
+	} else {
+		encoding = NULL;
+	}
+
+	memset(pc_item, 0, sizeof(*pc_item));
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce->ce_namelen = fixed_portion->name_len;
+	pc_item->ce->ce_mode = fixed_portion->ce_mode;
+	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
+	oidcpy(&pc_item->ce->oid, &fixed_portion->oid);
+
+	pc_item->id = fixed_portion->id;
+	pc_item->ca.crlf_action = fixed_portion->crlf_action;
+	pc_item->ca.ident = fixed_portion->ident;
+	pc_item->ca.working_tree_encoding = encoding;
+}
+
+static void report_result(struct parallel_checkout_item *pc_item)
+{
+	struct pc_item_result res;
+	size_t size;
+
+	res.id = pc_item->id;
+	res.status = pc_item->status;
+
+	if (pc_item->status == PC_ITEM_WRITTEN) {
+		res.st = pc_item->st;
+		size = sizeof(res);
+	} else {
+		size = PC_ITEM_RESULT_BASE_SIZE;
+	}
+
+	packet_write(1, (const char *)&res, size);
+}
+
+/* Free the worker-side malloced data, but not pc_item itself. */
+static void release_pc_item_data(struct parallel_checkout_item *pc_item)
+{
+	free((char *)pc_item->ca.working_tree_encoding);
+	discard_cache_entry(pc_item->ce);
+}
+
+static void worker_loop(struct checkout *state)
+{
+	struct parallel_checkout_item *items = NULL;
+	size_t i, nr = 0, alloc = 0;
+
+	while (1) {
+		int len;
+		char *line = packet_read_line(0, &len);
+
+		if (!line)
+			break;
+
+		ALLOC_GROW(items, nr + 1, alloc);
+		packet_to_pc_item(line, len, &items[nr++]);
+	}
+
+	for (i = 0; i < nr; i++) {
+		struct parallel_checkout_item *pc_item = &items[i];
+		write_pc_item(pc_item, state);
+		report_result(pc_item);
+		release_pc_item_data(pc_item);
+	}
+
+	packet_flush(1);
+
+	free(items);
+}
+
+static const char * const checkout_helper_usage[] = {
+	N_("git checkout--helper [<options>]"),
+	NULL
+};
+
+int cmd_checkout__helper(int argc, const char **argv, const char *prefix)
+{
+	struct checkout state = CHECKOUT_INIT;
+	struct option checkout_helper_options[] = {
+		OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
+			N_("when creating files, prepend <string>")),
+		OPT_END()
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(checkout_helper_usage,
+				   checkout_helper_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix, checkout_helper_options,
+			     checkout_helper_usage, 0);
+	if (argc > 0)
+		usage_with_options(checkout_helper_usage, checkout_helper_options);
+
+	if (state.base_dir)
+		state.base_dir_len = strlen(state.base_dir);
+
+	/*
+	 * Setting this on a worker won't actually update the index. We just
+	 * need to tell the checkout machinery to lstat() the written entries,
+	 * so that we can send this data back to the main process.
+	 */
+	state.refresh_cache = 1;
+
+	worker_loop(&state);
+	return 0;
+}
diff --git a/git.c b/git.c
index 9bc077a025..a9cd4b1e9a 100644
--- a/git.c
+++ b/git.c
@@ -490,6 +490,8 @@ static struct cmd_struct commands[] = {
 	{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
 	{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT  },
 	{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+	{ "checkout--helper", cmd_checkout__helper,
+		RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
 	{ "checkout-index", cmd_checkout_index,
 		RUN_SETUP | NEED_WORK_TREE},
 	{ "cherry", cmd_cherry, RUN_SETUP },
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 80a60eb2d3..df447aa3a6 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,28 +1,13 @@
 #include "cache.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "pkt-line.h"
+#include "run-command.h"
 #include "streaming.h"
 
-enum pc_item_status {
-	PC_ITEM_PENDING = 0,
-	PC_ITEM_WRITTEN,
-	/*
-	 * The entry could not be written because there was another file
-	 * already present in its path or leading directories. Since
-	 * checkout_entry_ca() removes such files from the working tree before
-	 * enqueueing the entry for parallel checkout, it means that there was
-	 * a path collision among the entries being written.
-	 */
-	PC_ITEM_COLLIDED,
-	PC_ITEM_FAILED,
-};
-
-struct parallel_checkout_item {
-	/* pointer to a istate->cache[] entry. Not owned by us. */
-	struct cache_entry *ce;
-	struct conv_attrs ca;
-	struct stat st;
-	enum pc_item_status status;
+struct pc_worker {
+	struct child_process cp;
+	size_t next_item_to_complete, nr_items_to_complete;
 };
 
 struct parallel_checkout {
@@ -121,10 +106,12 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
 		   parallel_checkout.alloc);
 
-	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item = &parallel_checkout.items[parallel_checkout.nr];
 	pc_item->ce = ce;
 	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
 	pc_item->status = PC_ITEM_PENDING;
+	pc_item->id = parallel_checkout.nr;
+	parallel_checkout.nr++;
 
 	return 0;
 }
@@ -237,7 +224,8 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
 	/*
 	 * checkout metadata is used to give context for external process
 	 * filters. Files requiring such filters are not eligible for parallel
-	 * checkout, so pass NULL.
+	 * checkout, so pass NULL. Note: if that changes, the metadata must also
+	 * be passed from the main process to the workers.
 	 */
 	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
 					 new_blob, size, &buf, NULL);
@@ -268,8 +256,8 @@ static int close_and_clear(int *fd)
 	return ret;
 }
 
-static void write_pc_item(struct parallel_checkout_item *pc_item,
-			  struct checkout *state)
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state)
 {
 	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
 	int fd = -1, fstat_done = 0;
@@ -343,6 +331,221 @@ static void write_pc_item(struct parallel_checkout_item *pc_item,
 	strbuf_release(&path);
 }
 
+static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
+{
+	size_t len_data;
+	char *data, *variant;
+	struct pc_item_fixed_portion *fixed_portion;
+	const char *working_tree_encoding = pc_item->ca.working_tree_encoding;
+	size_t name_len = pc_item->ce->ce_namelen;
+	size_t working_tree_encoding_len = working_tree_encoding ?
+					   strlen(working_tree_encoding) : 0;
+
+	len_data = sizeof(struct pc_item_fixed_portion) + name_len +
+		   working_tree_encoding_len;
+
+	data = xcalloc(1, len_data);
+
+	fixed_portion = (struct pc_item_fixed_portion *)data;
+	fixed_portion->id = pc_item->id;
+	fixed_portion->ce_mode = pc_item->ce->ce_mode;
+	fixed_portion->crlf_action = pc_item->ca.crlf_action;
+	fixed_portion->ident = pc_item->ca.ident;
+	fixed_portion->name_len = name_len;
+	fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
+	/*
+	 * We use hashcpy() instead of oidcpy() because the hash[] positions
+	 * after `the_hash_algo->rawsz` might not be initialized. And Valgrind
+	 * would complain about passing uninitialized bytes to a syscall
+	 * (write(2)). There is no real harm in this case, but the warning could
+	 * hinder the detection of actual errors.
+	 */
+	hashcpy(fixed_portion->oid.hash, pc_item->ce->oid.hash);
+
+	variant = data + sizeof(*fixed_portion);
+	if (working_tree_encoding_len) {
+		memcpy(variant, working_tree_encoding, working_tree_encoding_len);
+		variant += working_tree_encoding_len;
+	}
+	memcpy(variant, pc_item->ce->name, name_len);
+
+	packet_write(fd, data, len_data);
+
+	free(data);
+}
+
+static void send_batch(int fd, size_t start, size_t nr)
+{
+	size_t i;
+	for (i = 0; i < nr; i++)
+		send_one_item(fd, &parallel_checkout.items[start + i]);
+	packet_flush(fd);
+}
+
+static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
+{
+	struct pc_worker *workers;
+	int i, workers_with_one_extra_item;
+	size_t base_batch_size, batch_beginning = 0;
+
+	ALLOC_ARRAY(workers, num_workers);
+
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+
+		child_process_init(cp);
+		cp->git_cmd = 1;
+		cp->in = -1;
+		cp->out = -1;
+		cp->clean_on_exit = 1;
+		strvec_push(&cp->args, "checkout--helper");
+		if (state->base_dir_len)
+			strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
+		if (start_command(cp))
+			die(_("failed to spawn checkout worker"));
+	}
+
+	base_batch_size = parallel_checkout.nr / num_workers;
+	workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
+	for (i = 0; i < num_workers; i++) {
+		struct pc_worker *worker = &workers[i];
+		size_t batch_size = base_batch_size;
+
+		/* distribute the extra work evenly */
+		if (i < workers_with_one_extra_item)
+			batch_size++;
+
+		send_batch(worker->cp.in, batch_beginning, batch_size);
+		worker->next_item_to_complete = batch_beginning;
+		worker->nr_items_to_complete = batch_size;
+
+		batch_beginning += batch_size;
+	}
+
+	return workers;
+}
+
+static void finish_workers(struct pc_worker *workers, int num_workers)
+{
+	int i;
+
+	/*
+	 * Close pipes before calling finish_command() to let the workers
+	 * exit asynchronously and avoid spending extra time on wait().
+	 */
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+		if (cp->in >= 0)
+			close(cp->in);
+		if (cp->out >= 0)
+			close(cp->out);
+	}
+
+	for (i = 0; i < num_workers; i++) {
+		if (finish_command(&workers[i].cp))
+			error(_("checkout worker %d finished with error"), i);
+	}
+
+	free(workers);
+}
+
+#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
+{ \
+	if (got != exp) \
+		BUG("corrupted result from checkout worker (got %dB, exp %dB)", \
+		    got, exp); \
+} while(0)
+
+static void parse_and_save_result(const char *line, int len,
+				  struct pc_worker *worker)
+{
+	struct pc_item_result *res;
+	struct parallel_checkout_item *pc_item;
+	struct stat *st = NULL;
+
+	if (len < PC_ITEM_RESULT_BASE_SIZE)
+		BUG("too short result from checkout worker (got %dB, exp %dB)",
+		    len, (int)PC_ITEM_RESULT_BASE_SIZE);
+
+	res = (struct pc_item_result *)line;
+
+	/*
+	 * Worker should send either the full result struct on success, or
+	 * just the base (i.e. no stat data), otherwise.
+	 */
+	if (res->status == PC_ITEM_WRITTEN) {
+		ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
+		st = &res->st;
+	} else {
+		ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
+	}
+
+	if (!worker->nr_items_to_complete || res->id != worker->next_item_to_complete)
+		BUG("checkout worker sent unexpected item id");
+
+	worker->next_item_to_complete++;
+	worker->nr_items_to_complete--;
+
+	pc_item = &parallel_checkout.items[res->id];
+	pc_item->status = res->status;
+	if (st)
+		pc_item->st = *st;
+}
+
+
+static void gather_results_from_workers(struct pc_worker *workers,
+					int num_workers)
+{
+	int i, active_workers = num_workers;
+	struct pollfd *pfds;
+
+	CALLOC_ARRAY(pfds, num_workers);
+	for (i = 0; i < num_workers; i++) {
+		pfds[i].fd = workers[i].cp.out;
+		pfds[i].events = POLLIN;
+	}
+
+	while (active_workers) {
+		int nr = poll(pfds, num_workers, -1);
+
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("failed to poll checkout workers");
+		}
+
+		for (i = 0; i < num_workers && nr > 0; i++) {
+			struct pc_worker *worker = &workers[i];
+			struct pollfd *pfd = &pfds[i];
+
+			if (!pfd->revents)
+				continue;
+
+			if (pfd->revents & POLLIN) {
+				int len;
+				const char *line = packet_read_line(pfd->fd, &len);
+
+				if (!line) {
+					pfd->fd = -1;
+					active_workers--;
+				} else {
+					parse_and_save_result(line, len, worker);
+				}
+			} else if (pfd->revents & POLLHUP) {
+				pfd->fd = -1;
+				active_workers--;
+			} else if (pfd->revents & (POLLNVAL | POLLERR)) {
+				die(_("error polling from checkout worker"));
+			}
+
+			nr--;
+		}
+	}
+
+	free(pfds);
+}
+
 static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
@@ -351,16 +554,28 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
+#define DEFAULT_NUM_WORKERS 2
+
 int run_parallel_checkout(struct checkout *state)
 {
-	int ret;
+	int ret, num_workers = DEFAULT_NUM_WORKERS;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
 
-	write_items_sequentially(state);
+	if (parallel_checkout.nr < num_workers)
+		num_workers = parallel_checkout.nr;
+
+	if (num_workers <= 1) {
+		write_items_sequentially(state);
+	} else {
+		struct pc_worker *workers = setup_workers(state, num_workers);
+		gather_results_from_workers(workers, num_workers);
+		finish_workers(workers, num_workers);
+	}
+
 	ret = handle_results(state);
 
 	finish_parallel_checkout();
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 4ad2a519b3..35e5e69a96 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -1,9 +1,14 @@
 #ifndef PARALLEL_CHECKOUT_H
 #define PARALLEL_CHECKOUT_H
 
+#include "convert.h"
+
 struct cache_entry;
 struct checkout;
-struct conv_attrs;
+
+/****************************************************************
+ * Users of parallel checkout
+ ****************************************************************/
 
 enum pc_status {
 	PC_UNINITIALIZED = 0,
@@ -29,4 +34,70 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 /* Write all the queued entries, returning 0 on success.*/
 int run_parallel_checkout(struct checkout *state);
 
+/****************************************************************
+ * Interface with checkout--helper
+ ****************************************************************/
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/*
+	 * In main process ce points to a istate->cache[] entry. Thus, it's not
+	 * owned by us. In workers they own the memory, which *must be* released.
+	 */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	size_t id; /* position in parallel_checkout.items[] of main process */
+
+	/* Output fields, sent from workers. */
+	enum pc_item_status status;
+	struct stat st;
+};
+
+/*
+ * The fixed-size portion of `struct parallel_checkout_item` that is sent to the
+ * workers. Following this will be 2 strings: ca.working_tree_encoding and
+ * ce.name; These are NOT null terminated, since we have the size in the fixed
+ * portion.
+ *
+ * Note that not all fields of conv_attrs and cache_entry are passed, only the
+ * ones that will be required by the workers to smudge and write the entry.
+ */
+struct pc_item_fixed_portion {
+	size_t id;
+	struct object_id oid;
+	unsigned int ce_mode;
+	enum convert_crlf_action crlf_action;
+	int ident;
+	size_t working_tree_encoding_len;
+	size_t name_len;
+};
+
+/*
+ * The fields of `struct parallel_checkout_item` that are returned by the
+ * workers. Note: `st` must be the last one, as it is omitted on error.
+ */
+struct pc_item_result {
+	size_t id;
+	enum pc_item_status status;
+	struct stat st;
+};
+
+#define PC_ITEM_RESULT_BASE_SIZE offsetof(struct pc_item_result, st)
+
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state);
+
 #endif /* PARALLEL_CHECKOUT_H */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/5] parallel-checkout: add configuration options
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
  2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
  2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
@ 2021-03-17 21:12 ` Matheus Tavares
  2021-03-31  4:33   ` Christian Couder
  2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Matheus Tavares @ 2021-03-17 21:12 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.

To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.

Local SSD:
             Sequential             10 workers            Speedup
Clone        8.805 s ± 0.043 s      3.564 s ± 0.041 s     2.47 ± 0.03
Checkout I   9.678 s ± 0.057 s      4.486 s ± 0.050 s     2.16 ± 0.03
Checkout II  5.034 s ± 0.072 s      3.021 s ± 0.038 s     1.67 ± 0.03

Local HDD:
             Sequential             10 workers             Speedup
Clone        32.288 s ± 0.580 s     30.724 s ± 0.522 s    1.05 ± 0.03
Checkout I   54.172 s ±  7.119 s    54.429 s ± 6.738 s    1.00 ± 0.18
Checkout II  40.465 s ± 2.402 s     38.682 s ± 1.365 s    1.05 ± 0.07

Linux NFS server (v4.1, on EBS, single availability zone):

             Sequential             32 workers            Speedup
Clone        240.368 s ± 6.347 s    57.349 s ± 0.870 s    4.19 ± 0.13
Checkout I   242.862 s ± 2.215 s    58.700 s ± 0.904 s    4.14 ± 0.07
Checkout II  65.751 s ± 1.577 s     23.820 s ± 0.407 s    2.76 ± 0.08

EFS (v4.1, replicated over multiple availability zones):

             Sequential             32 workers            Speedup
Clone        922.321 s ± 2.274 s    210.453 s ± 3.412 s   4.38 ± 0.07
Checkout I   1011.300 s ± 7.346 s   297.828 s ± 0.964 s   3.40 ± 0.03
Checkout II  294.104 s ± 1.836 s    126.017 s ± 1.190 s   2.33 ± 0.03

The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.

To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column bellow corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:

             sequential fallback   10 workers           speedup
10   files    772.3 ms ± 12.6 ms   769.0 ms ± 13.6 ms   1.00 ± 0.02
20   files    780.5 ms ± 15.8 ms   775.2 ms ±  9.2 ms   1.01 ± 0.02
50   files    806.2 ms ± 13.8 ms   767.4 ms ±  8.5 ms   1.05 ± 0.02
100  files    833.7 ms ± 21.4 ms   750.5 ms ± 16.8 ms   1.11 ± 0.04
200  files    897.6 ms ± 30.9 ms   730.5 ms ± 14.7 ms   1.23 ± 0.05
500  files   1035.4 ms ± 48.0 ms   677.1 ms ± 22.3 ms   1.53 ± 0.09
1000 files   1244.6 ms ± 35.6 ms   654.0 ms ± 38.3 ms   1.90 ± 0.12
2000 files   1488.8 ms ± 53.4 ms   658.8 ms ± 23.8 ms   2.26 ± 0.12

From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.

Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.

About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/checkout.txt | 21 +++++++++++++++++++++
 parallel-checkout.c               | 23 ++++++++++++++++++-----
 parallel-checkout.h               |  9 +++++++--
 unpack-trees.c                    | 10 +++++++---
 4 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt
index 2cddf7b4b4..bfbca90f0e 100644
--- a/Documentation/config/checkout.txt
+++ b/Documentation/config/checkout.txt
@@ -21,3 +21,24 @@ checkout.guess::
 	Provides the default value for the `--guess` or `--no-guess`
 	option in `git checkout` and `git switch`. See
 	linkgit:git-switch[1] and linkgit:git-checkout[1].
+
+checkout.workers::
+	The number of parallel workers to use when updating the working tree.
+	The default is one, i.e. sequential execution. If set to a value less
+	than one, Git will use as many workers as the number of logical cores
+	available. This setting and `checkout.thresholdForParallelism` affect
+	all commands that perform checkout. E.g. checkout, clone, reset,
+	sparse-checkout, etc.
++
+Note: parallel checkout usually delivers better performance for repositories
+located on SSDs or over NFS. For repositories on spinning disks and/or machines
+with a small number of cores, the default sequential checkout often performs
+better. The size and compression level of a repository might also influence how
+well the parallel version performs.
+
+checkout.thresholdForParallelism::
+	When running parallel checkout with a small number of files, the cost
+	of subprocess spawning and inter-process communication might outweigh
+	the parallelization gains. This setting allows to define the minimum
+	number of files for which parallel checkout should be attempted. The
+	default is 100.
diff --git a/parallel-checkout.c b/parallel-checkout.c
index df447aa3a6..92f3872653 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,9 +1,11 @@
 #include "cache.h"
+#include "config.h"
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
 #include "run-command.h"
 #include "streaming.h"
+#include "thread-utils.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
 	return parallel_checkout.status;
 }
 
+#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
+
+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+	if (git_config_get_int("checkout.workers", num_workers))
+		*num_workers = 1;
+	else if (*num_workers < 1)
+		*num_workers = online_cpus();
+
+	if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+		*threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
 void init_parallel_checkout(void)
 {
 	if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -554,11 +569,9 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
-#define DEFAULT_NUM_WORKERS 2
-
-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
 {
-	int ret, num_workers = DEFAULT_NUM_WORKERS;
+	int ret;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
@@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state)
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
 
-	if (num_workers <= 1) {
+	if (num_workers <= 1 || parallel_checkout.nr < threshold) {
 		write_items_sequentially(state);
 	} else {
 		struct pc_worker *workers = setup_workers(state, num_workers);
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 35e5e69a96..26f61ed2ac 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -17,6 +17,7 @@ enum pc_status {
 };
 
 enum pc_status parallel_checkout_status(void);
+void get_parallel_checkout_configs(int *num_workers, int *threshold);
 
 /*
  * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
@@ -31,8 +32,12 @@ void init_parallel_checkout(void);
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 
-/* Write all the queued entries, returning 0 on success.*/
-int run_parallel_checkout(struct checkout *state);
+/*
+ * Write all the queued entries, returning 0 on success. If the number of
+ * entries is smaller than the specified threshold, the operation is performed
+ * sequentially.
+ */
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
 
 /****************************************************************
  * Interface with checkout--helper
diff --git a/unpack-trees.c b/unpack-trees.c
index b9548de96a..8bc5061487 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
 	int errs = 0;
 	struct progress *progress;
 	struct checkout state = CHECKOUT_INIT;
-	int i;
+	int i, pc_workers, pc_threshold;
 
 	trace_performance_enter();
 	state.force = 1;
@@ -465,8 +465,11 @@ static int check_updates(struct unpack_trees_options *o,
 		oid_array_clear(&to_fetch);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
 	enable_delayed_checkout(&state);
-	init_parallel_checkout();
+	if (pc_workers > 1)
+		init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
-	errs |= run_parallel_checkout(&state);
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/5] parallel-checkout: support progress displaying
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
                   ` (2 preceding siblings ...)
  2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
@ 2021-03-17 21:12 ` Matheus Tavares
  2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-03-17 21:12 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c | 34 +++++++++++++++++++++++++++++++---
 parallel-checkout.h |  5 ++++-
 unpack-trees.c      | 11 ++++++++---
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/parallel-checkout.c b/parallel-checkout.c
index 92f3872653..2ba0bbb50f 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -3,6 +3,7 @@
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
+#include "progress.h"
 #include "run-command.h"
 #include "streaming.h"
 #include "thread-utils.h"
@@ -16,6 +17,8 @@ struct parallel_checkout {
 	enum pc_status status;
 	struct parallel_checkout_item *items;
 	size_t nr, alloc;
+	struct progress *progress;
+	unsigned int *progress_cnt;
 };
 
 static struct parallel_checkout parallel_checkout;
@@ -131,6 +134,20 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	return 0;
 }
 
+size_t pc_queue_size(void)
+{
+	return parallel_checkout.nr;
+}
+
+static void advance_progress_meter(void)
+{
+	if (parallel_checkout.progress) {
+		(*parallel_checkout.progress_cnt)++;
+		display_progress(parallel_checkout.progress,
+				 *parallel_checkout.progress_cnt);
+	}
+}
+
 static int handle_results(struct checkout *state)
 {
 	int ret = 0;
@@ -179,6 +196,7 @@ static int handle_results(struct checkout *state)
 			 */
 			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
 						 state, NULL, NULL);
+			advance_progress_meter();
 			break;
 		case PC_ITEM_PENDING:
 			have_pending = 1;
@@ -506,6 +524,9 @@ static void parse_and_save_result(const char *line, int len,
 	pc_item->status = res->status;
 	if (st)
 		pc_item->st = *st;
+
+	if (res->status != PC_ITEM_COLLIDED)
+		advance_progress_meter();
 }
 
 
@@ -565,11 +586,16 @@ static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
 
-	for (i = 0; i < parallel_checkout.nr; i++)
-		write_pc_item(&parallel_checkout.items[i], state);
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		write_pc_item(pc_item, state);
+		if (pc_item->status != PC_ITEM_COLLIDED)
+			advance_progress_meter();
+	}
 }
 
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt)
 {
 	int ret;
 
@@ -577,6 +603,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
+	parallel_checkout.progress = progress;
+	parallel_checkout.progress_cnt = progress_cnt;
 
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 26f61ed2ac..4114c6bda2 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -5,6 +5,7 @@
 
 struct cache_entry;
 struct checkout;
+struct progress;
 
 /****************************************************************
  * Users of parallel checkout
@@ -31,13 +32,15 @@ void init_parallel_checkout(void);
  * for later write and return 0.
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+size_t pc_queue_size(void);
 
 /*
  * Write all the queued entries, returning 0 on success. If the number of
  * entries is smaller than the specified threshold, the operation is performed
  * sequentially.
  */
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt);
 
 /****************************************************************
  * Interface with checkout--helper
diff --git a/unpack-trees.c b/unpack-trees.c
index 8bc5061487..0e463870fd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -474,17 +474,22 @@ static int check_updates(struct unpack_trees_options *o,
 		struct cache_entry *ce = index->cache[i];
 
 		if (ce->ce_flags & CE_UPDATE) {
+			size_t last_pc_queue_size = pc_queue_size();
+
 			if (ce->ce_flags & CE_WT_REMOVE)
 				BUG("both update and delete flags are set on %s",
 				    ce->name);
-			display_progress(progress, ++cnt);
 			ce->ce_flags &= ~CE_UPDATE;
 			errs |= checkout_entry(ce, &state, NULL, NULL);
+
+			if (last_pc_queue_size == pc_queue_size())
+				display_progress(progress, ++cnt);
 		}
 	}
-	stop_progress(&progress);
 	if (pc_workers > 1)
-		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      progress, &cnt);
+	stop_progress(&progress);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/5] parallel-checkout: add design documentation
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
                   ` (3 preceding siblings ...)
  2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
@ 2021-03-17 21:12 ` Matheus Tavares
  2021-03-31  5:36   ` Christian Couder
  2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Matheus Tavares @ 2021-03-17 21:12 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/Makefile                        |   1 +
 Documentation/technical/parallel-checkout.txt | 262 ++++++++++++++++++
 2 files changed, 263 insertions(+)
 create mode 100644 Documentation/technical/parallel-checkout.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a04..af236927c9 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -90,6 +90,7 @@ TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
+TECH_DOCS += technical/parallel-checkout
 TECH_DOCS += technical/partial-clone
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt
new file mode 100644
index 0000000000..c7f6570f1c
--- /dev/null
+++ b/Documentation/technical/parallel-checkout.txt
@@ -0,0 +1,262 @@
+Parallel Checkout Design Notes
+==============================
+
+The "Parallel Checkout" feature attempts to use multiple processes to
+parallelize the work of uncompressing, smudging, and writing blobs
+during a checkout operation. It can be used by all checkout-related
+commands, such as `clone`, `checkout`, `reset`, `sparse-checkout`, and
+others.
+
+These commands share the following basic structure:
+
+* Step 1: Read the current index file into memory.
+
+* Step 2: Modify the in-memory index based upon the command, and
+  temporarily mark all cache entries that need to be updated.
+
+* Step 3: Populate the working tree to match the new candidate index.
+  This includes iterating over all of the to-be-updated cache entries
+  and delete, create, or overwrite the associated files in the working
+  tree.
+
+* Step 4: Write the new index to disk.
+
+Step 3 is the focus of the "parallel checkout" effort described here.
+It dominates the execution time for most of the above command types.
+
+Sequential Implementation
+-------------------------
+
+For the purposes of discussion here, the current sequential
+implementation of Step 3 has 3 layers:
+
+* Step 3a: `unpack-trees.c:check_updates()` contains a series of
+  sequential loops iterating over the `cache_entry`'s array. The main
+  loop in this function calls the next layer for each of the
+  to-be-updated entries.
+
+* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
+  for file conflicts, collisions, and unsaved changes. It removes files
+  and create leading directories as necessary. It calls the next layer
+  for each entry to be written.
+
+* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
+  it if necessary, creates the file in the working tree, writes the
+  smudged contents, calls `fstat()` or `lstat()`, and updates the
+  associated `cache_entry` struct with the stat information gathered.
+
+It wouldn't be safe to perform Step 3b in parallel, as there could be
+race conditions between file creations and removals. Instead, the
+parallel checkout framework lets the sequential code handle Step 3b,
+and use parallel workers to replace the sequential
+`entry.c:write_entry()` calls from Step 3c.
+
+Rejected Multi-Threaded Solution
+--------------------------------
+
+The most "straightforward" implementation would be to spread the set of
+to-be-updated cache entries across multiple threads. But due to the
+thread-unsafe functions in the ODB code, we would have to use locks to
+coordinate the parallel operation. An early prototype of this solution
+showed that the multi-threaded checkout would bring performance
+improvements over the sequential code, but there was still too much lock
+contention. A `perf` profiling indicated that around 20% of the runtime
+during a local Linux clone (on an SSD) was spent in locking functions.
+For this reason this approach was rejected in favor of using multiple
+child processes, which led to a better performance.
+
+Multi-Process Solution
+----------------------
+
+Parallel checkout alters the aforementioned Step 3 to use multiple
+`checkout--helper` background processes to distribute the work. The
+long-running worker processes are controlled by the foreground Git
+command using the existing run-command API.
+
+Overview
+~~~~~~~~
+
+Step 3b is only slightly altered; for each entry to be checked out, the
+main process:
+
+* M1: Checks whether there is any untracked or unclean file in the
+  working tree which would be overwritten by this entry, and decides
+  whether to proceed (removing the file(s)) or not.
+
+* M2: Creates the leading directories.
+
+* M3: Loads the conversion attributes for the entry's path.
+
+* M4: Checks, based on the entry's type and conversion attributes,
+  whether the entry is eligible for parallel checkout (more on this
+  later). If it is eligible, enqueues the entry and the loaded
+  attributes to later write the entry in parallel. If not, writes the
+  entry right away, using the default sequential code.
+
+Note: we save the conversion attributes associated with each entry
+because the workers don't have access to the main process' index state,
+so they can't load the attributes by themselves (and the attributes are
+needed to properly smudge the entry). Additionally, this has a positive
+impact on performance as (1) we don't need to load the attributes twice
+and (2) the attributes machinery is optimized to handle paths in
+sequential order.
+
+After all entries have passed through the above steps, the main process
+checks if the number of enqueued entries is sufficient to spread among
+the workers. If not, it just writes them sequentially. Otherwise, it
+spawns the workers and distributes the queued entries uniformly in
+continuous chunks. This aims to minimize the chances of two workers
+writing to the same directory simultaneously, which could increase lock
+contention in the kernel.
+
+Then, for each assigned item, each worker:
+
+* W1: Checks if there is any non-directory file in the leading part of
+  the entry's path or if there already exists a file at the entry' path.
+  If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
+  this later).
+
+* W2: Creates the file (with O_CREAT and O_EXCL).
+
+* W3: Loads the blob into memory (inflating and delta reconstructing
+  it).
+
+* W4: Filters the blob.
+
+* W5: Writes the result to the file descriptor opened at W2.
+
+* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
+  the result back to the main process, together with the end status of
+  the operation and the item's identification number.
+
+Note that steps W3 to W5 might actually be performed together, using the
+streaming interface.
+
+Also note that the workers *never* remove any files. As mentioned
+earlier, it is the responsibility of the main process to remove any
+files that block the checkout operation (or abort it). This is crucial
+to avoid race conditions and also to properly detect path collisions at
+Step W1.
+
+After the workers finish writing the items and sending back the required
+information, the main process handles the results in two steps:
+
+- First, it updates the in-memory index with the `lstat()` information
+  sent by the workers. (This must be done first as this information
+  might me required in the following step.)
+
+- Then it writes the items which collided on disk (i.e. items marked
+  with `PC_ITEM_COLLIDED`). More on this below.
+
+Path Collisions
+---------------
+
+Path collisions happen when two different paths correspond to the same
+entry in the file system. E.g. the paths 'a' and 'A' would collide in a
+case-insensitive file system.
+
+The sequential checkout deals with collisions in the same way that it
+deals with files that were already present in the working tree before
+checkout. Basically, it checks if the path that it wants to write
+already exists on disk, makes sure the existing file doesn't have
+unsaved data, and then overwrite it. (To be more pedantic: it deletes
+the existing file and creates the new one.) So, if there are multiple
+colliding files to be checked out, the sequential code will write each
+one of them but only the last will actually survive on disk.
+
+Parallel checkout aims to reproduce the same behavior. However, we
+cannot let the workers racily write to the same file on disk. Instead,
+the workers detect when the entry that they want to check out would
+collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
+Later, the main process can sequentially feed these entries back to
+`checkout_entry()` without the risk of race conditions. On clone, this
+also has the effect of marking the colliding entries to later emit a
+warning for the user, like the classic sequential checkout does.
+
+The workers are able to detect both collisions among the entries being
+concurrently written and collisions among parallel-eligible and
+ineligible entries. The general idea for collision detection is quite
+straightforward: for each parallel-eligible entry, the main process must
+remove all files that prevent this entry from being written (before
+enqueueing it). This includes any non-directory file in the leading path
+of the entry. Later, when a worker gets assigned the entry, it looks
+again for the non-directories files and for an already existent file at
+the entry's path. If any of these checks finds something, the worker
+knows that there was a path collision.
+
+Because parallel checkout can distinguish path collisions from the case
+where the file was already present in the working tree before checkout,
+we could alternatively choose to skip the checkout of colliding entries.
+However, each entry that doesn't get written would have NULL `lstat()`
+fields on the index. This could cause performance penalties for
+subsequent commands that need to refresh the index, as they would have
+to go to the file system to see if the entry is dirty. Thus, if we have
+N entries in a colliding group and we decide to write and `lstat()` only
+one of them, every subsequent `git-status` will have to read, convert,
+and hash the written file N - 1 times. By checking out all colliding
+entries (like the sequential code does), we only pay the overhead once,
+during checkout.
+
+Eligible Entries for Parallel Checkout
+--------------------------------------
+
+As previously mentioned, not all entries passed to `checkout_entry()`
+will be considered eligible for parallel checkout. More specifically, we
+exclude:
+
+- Symbolic links; to avoid race conditions that, in combination with
+  path collisions, could cause workers to write files at the wrong
+  place. For example, if we were to concurrently check out a symlink
+  'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
+  we could potentially end up writing the file 'A/f' at 'a/f', due to a
+  race condition.
+
+- Regular files that require external filters (either "one shot" filters
+  or long-running process filters). These filters are black-boxes to Git
+  and may have their own internal locking or non-concurrent assumptions.
+  So it might not be safe to run multiple instances in parallel.
++
+Besides, long-running filters may use the delayed checkout feature to
+postpone the return of some filtered blobs. The delayed checkout queue
+and the parallel checkout queue are not compatible and should remain
+separated.
+
+Ineligible entries are checked out by the classic sequential codepath
+*before* spawning workers.
+
+Note: submodules's files are also eligible for parallel checkout (as
+long as they don't fall into the two excluding categories mentioned
+above). But since each submodule is checked out in its own child
+process, we don't mix the superproject's and the submodules' files in
+the same parallel checkout process or queue.
+
+The API
+-------
+
+The parallel checkout API was designed with the goal to minimize changes
+to the current users of the checkout machinery. This means that they
+don't have to call a different function for sequential or parallel
+checkout. As already mentioned, `checkout_entry()` will automatically
+insert the given entry in the parallel checkout queue when this feature
+is enabled and the entry is eligible; otherwise, it will just write the
+entry right away, using the sequential code. In general, callers of the
+parallel checkout API should look similar to this:
+
+----------------------------------------------
+int pc_workers, pc_threshold, err = 0;
+struct checkout state;
+
+get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
+/*
+ * This check is not strictly required, but it
+ * should save some time in sequential mode.
+ */
+if (pc_workers > 1)
+	init_parallel_checkout();
+
+for (each cache_entry ce to-be-updated)
+	err |= checkout_entry(ce, &state, NULL, NULL);
+
+err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL);
+----------------------------------------------
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/5] Parallel Checkout (part 2)
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
                   ` (4 preceding siblings ...)
  2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
@ 2021-03-18 20:56 ` Junio C Hamano
  2021-03-19  3:24   ` Matheus Tavares
  2021-03-31  5:42 ` Christian Couder
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
  7 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2021-03-18 20:56 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, christian.couder, git

Matheus Tavares <matheus.bernardino@usp.br> writes:

> This is the next step in the parallel checkout implementation. An
> overview of the complete series can be seen at [1]. 
>
> The last patch in this series adds a design doc, so it may help to
> review it first. Also, there is no need to have any familiarity with
> part-1, as this part doesn't have any semantic dependency with that.
>
> This series is based on the merge of 'mt/parallel-checkout-part-1' and
> 'master', so that it can use the "brew cast" fix and the latest security
> fix (both from master), to run the tests. (The merge is textually
> clean, but it needs a small semantic fix: the '#include "entry.h"'
> addition in builtin/stash.c).

Let's redo part-1 on top of 'master' first without such a merge; it
has been out of 'next' so we can do so easily without wanting for
the tip of 'next' to get rewound.



>
> Parallel-checkout-specific tests will be added in part-3.
>
> [1]: https://lore.kernel.org/git/cover.1604521275.git.matheus.bernardino@usp.br/
>
> Matheus Tavares (5):
>   unpack-trees: add basic support for parallel checkout
>   parallel-checkout: make it truly parallel
>   parallel-checkout: add configuration options
>   parallel-checkout: support progress displaying
>   parallel-checkout: add design documentation
>
>  .gitignore                                    |   1 +
>  Documentation/Makefile                        |   1 +
>  Documentation/config/checkout.txt             |  21 +
>  Documentation/technical/parallel-checkout.txt | 262 ++++++++
>  Makefile                                      |   2 +
>  builtin.h                                     |   1 +
>  builtin/checkout--helper.c                    | 142 ++++
>  entry.c                                       |  17 +-
>  git.c                                         |   2 +
>  parallel-checkout.c                           | 624 ++++++++++++++++++
>  parallel-checkout.h                           | 111 ++++
>  unpack-trees.c                                |  19 +-
>  12 files changed, 1198 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/technical/parallel-checkout.txt
>  create mode 100644 builtin/checkout--helper.c
>  create mode 100644 parallel-checkout.c
>  create mode 100644 parallel-checkout.h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/5] Parallel Checkout (part 2)
  2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
@ 2021-03-19  3:24   ` Matheus Tavares
  2021-03-19 22:58     ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Matheus Tavares @ 2021-03-19  3:24 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

On Thu, Mar 18, 2021 at 5:56 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > This is the next step in the parallel checkout implementation. An
> > overview of the complete series can be seen at [1].
> >
> > The last patch in this series adds a design doc, so it may help to
> > review it first. Also, there is no need to have any familiarity with
> > part-1, as this part doesn't have any semantic dependency with that.
> >
> > This series is based on the merge of 'mt/parallel-checkout-part-1' and
> > 'master', so that it can use the "brew cast" fix and the latest security
> > fix (both from master), to run the tests. (The merge is textually
> > clean, but it needs a small semantic fix: the '#include "entry.h"'
> > addition in builtin/stash.c).
>
> Let's redo part-1 on top of 'master' first without such a merge; it
> has been out of 'next' so we can do so easily without wanting for
> the tip of 'next' to get rewound.

Thanks!

I saw you've added the "entry.h" inclusion that was missing at
builtin/stash when merging this branch on 'seen'. However, now that
part-1 is based on 'master', the branch is no longer buildable without
this fix. So could we perhaps squash this change directly into the
relevant commit in this series? (I'm asking this mainly to allow me to
later base part-3 directly on part-2; without having to base it on a
merge again.)

If you agree, here is a fixup! to be squashed into part-1, for
convenience:

-- >8 --
Subject: [PATCH] fixup! entry: extract a header file for entry.c functions

---
 builtin/stash.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/stash.c b/builtin/stash.c
index ba774cce67..11f3ae3039 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -15,6 +15,7 @@
 #include "log-tree.h"
 #include "diffcore.h"
 #include "exec-cmd.h"
+#include "entry.h"
 
 #define INCLUDE_ALL_FILES 2
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/5] Parallel Checkout (part 2)
  2021-03-19  3:24   ` Matheus Tavares
@ 2021-03-19 22:58     ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2021-03-19 22:58 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, christian.couder, git

Matheus Tavares <matheus.bernardino@usp.br> writes:

>> Let's redo part-1 on top of 'master' first without such a merge; it
>> has been out of 'next' so we can do so easily without wanting for
>> the tip of 'next' to get rewound.
>
> Thanks!
>
> I saw you've added the "entry.h" inclusion that was missing at
> builtin/stash when merging this branch on 'seen'. However, now that
> part-1 is based on 'master', the branch is no longer buildable without
> this fix. So could we perhaps squash this change directly into the
> relevant commit in this series?

Yeah, that was the kind of thing I had in mind when I suggested you
to "Let's redo part-1 on top of 'master'".

I'll mark the topic to be "expecting a reroll" for now, as I am deep
in today's integration cycle.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/5] unpack-trees: add basic support for parallel checkout
  2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2021-03-31  4:22   ` Christian Couder
  2021-04-02 14:39     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Couder @ 2021-03-31  4:22 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:

> +struct parallel_checkout_item {
> +       /* pointer to a istate->cache[] entry. Not owned by us. */
> +       struct cache_entry *ce;
> +       struct conv_attrs ca;
> +       struct stat st;
> +       enum pc_item_status status;
> +};
> +
> +struct parallel_checkout {
> +       enum pc_status status;
> +       struct parallel_checkout_item *items;
> +       size_t nr, alloc;
> +};

It looks like this struct parallel_checkout contains what is called
the parallel checkout queue in many places. It would be a bit better
if a name or at least a comment could make that clear.

> +static struct parallel_checkout parallel_checkout;

[...]

> +static void finish_parallel_checkout(void)
> +{
> +       if (parallel_checkout.status == PC_UNINITIALIZED)
> +               BUG("cannot finish parallel checkout: not initialized yet");
> +
> +       free(parallel_checkout.items);

Ok, it looks like the queue can indeed be freed like this as the items
don't need their fields to be freed.

> +       memset(&parallel_checkout, 0, sizeof(parallel_checkout));

That should work to reset it.

> +}
> +
> +static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
> +                                            const struct conv_attrs *ca)
> +{
> +       enum conv_attrs_classification c;
> +
> +       /*
> +        * Symlinks cannot be checked out in parallel as, in case of path
> +        * collision, they could racily replace leading directories of other
> +        * entries being checked out. Submodules are checked out in child
> +        * processes, which have their own parallel checkout queues.
> +        */
> +       if (!S_ISREG(ce->ce_mode))
> +               return 0;
> +
> +       c = classify_conv_attrs(ca);
> +       switch (c) {
> +       case CA_CLASS_INCORE:
> +               return 1;
> +
> +       case CA_CLASS_INCORE_FILTER:
> +               /*
> +                * It would be safe to allow concurrent instances of
> +                * single-file smudge filters, like rot13, but we should not
> +                * assume that all filters are parallel-process safe. So we
> +                * don't allow this.
> +                */
> +               return 0;
> +
> +       case CA_CLASS_INCORE_PROCESS:
> +               /*
> +                * The parallel queue and the delayed queue are not compatible,
> +                * so they must be kept completely separated. And we can't tell
> +                * if a long-running process will delay its response without
> +                * actually asking it to perform the filtering. Therefore, this
> +                * type of filter is not allowed in parallel checkout.
> +                *
> +                * Furthermore, there should only be one instance of the
> +                * long-running process filter as we don't know how it is
> +                * managing its own concurrency. So, spreading the entries that
> +                * requisite such a filter among the parallel workers would
> +                * require a lot more inter-process communication. We would
> +                * probably have to designate a single process to interact with
> +                * the filter and send all the necessary data to it, for each
> +                * entry.
> +                */
> +               return 0;

So it looks like we don't process entries that need filtering. It's a
bit disappointing as the commit message says:

"This new interface allows us to enqueue some of the entries being
checked out to later uncompress, smudge, and write them in parallel."

So it seems to say that entries could be smudged in parallel. Or maybe
I am missing something?

> +       case CA_CLASS_STREAMABLE:
> +               return 1;
> +
> +       default:
> +               BUG("unsupported conv_attrs classification '%d'", c);
> +       }
> +}

[...]

> +static int handle_results(struct checkout *state)
> +{
> +       int ret = 0;
> +       size_t i;
> +       int have_pending = 0;
> +
> +       /*
> +        * We first update the successfully written entries with the collected
> +        * stat() data, so that they can be found by mark_colliding_entries(),
> +        * in the next loop, when necessary.
> +        */
> +       for (i = 0; i < parallel_checkout.nr; i++) {
> +               struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
> +               if (pc_item->status == PC_ITEM_WRITTEN)
> +                       update_ce_after_write(state, pc_item->ce, &pc_item->st);
> +       }
> +
> +       for (i = 0; i < parallel_checkout.nr; i++) {
> +               struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
> +
> +               switch(pc_item->status) {
> +               case PC_ITEM_WRITTEN:
> +                       /* Already handled */
> +                       break;
> +               case PC_ITEM_COLLIDED:
> +                       /*
> +                        * The entry could not be checked out due to a path
> +                        * collision with another entry. Since there can only
> +                        * be one entry of each colliding group on the disk, we
> +                        * could skip trying to check out this one and move on.
> +                        * However, this would leave the unwritten entries with
> +                        * null stat() fields on the index, which could
> +                        * potentially slow down subsequent operations that
> +                        * require refreshing it: git would not be able to
> +                        * trust st_size and would have to go to the filesystem
> +                        * to see if the contents match (see ie_modified()).
> +                        *
> +                        * Instead, let's pay the overhead only once, now, and
> +                        * call checkout_entry_ca() again for this file, to
> +                        * have it's stat() data stored in the index. This also

s/it's/its/

> +                        * has the benefit of adding this entry and its
> +                        * colliding pair to the collision report message.
> +                        * Additionally, this overwriting behavior is consistent
> +                        * with what the sequential checkout does, so it doesn't
> +                        * add any extra overhead.
> +                        */
> +                       ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
> +                                                state, NULL, NULL);
> +                       break;
> +               case PC_ITEM_PENDING:
> +                       have_pending = 1;
> +                       /* fall through */
> +               case PC_ITEM_FAILED:
> +                       ret = -1;
> +                       break;
> +               default:
> +                       BUG("unknown checkout item status in parallel checkout");
> +               }
> +       }
> +
> +       if (have_pending)
> +               error(_("parallel checkout finished with pending entries"));

Is this an error that can actually happen? Or is it a bug? If this can
actually happen what is the state of the working directory and what
can the user do? Start another checkout with an option to do it
sequentially?

> +       return ret;
> +}

[...]

> +static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
> +                              const char *path)
> +{
> +       int ret;
> +       struct stream_filter *filter;
> +       struct strbuf buf = STRBUF_INIT;
> +       char *new_blob;
> +       unsigned long size;

I thought that we shouldn't use unsigned long anymore for sizes, but
unfortunately it looks like that's what read_blob_entry() expects.

> +       size_t newsize = 0;

I don't think we need to initialize newsize. Also we could perhaps
declare it closer to where it is used below.

> +       ssize_t wrote;
> +
> +       /* Sanity check */
> +       assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
> +
> +       filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
> +       if (filter) {
> +               if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
> +                       /* On error, reset fd to try writing without streaming */
> +                       if (reset_fd(fd, path))
> +                               return -1;
> +               } else {
> +                       return 0;
> +               }
> +       }
> +
> +       new_blob = read_blob_entry(pc_item->ce, &size);
> +       if (!new_blob)
> +               return error("unable to read sha1 file of %s (%s)", path,

With the support of sha256, I guess we should avoid talking about
sha1. Maybe: s/sha1/object/ Or maybe we could just say that we
couldn't read the object, as read_blob_entry() would perhaps read it
from a pack-file and not from a loose file?

> +                            oid_to_hex(&pc_item->ce->oid));
> +
> +       /*
> +        * checkout metadata is used to give context for external process
> +        * filters. Files requiring such filters are not eligible for parallel
> +        * checkout, so pass NULL.
> +        */
> +       ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
> +                                        new_blob, size, &buf, NULL);
> +
> +       if (ret) {
> +               free(new_blob);
> +               new_blob = strbuf_detach(&buf, &newsize);
> +               size = newsize;

"newsize" seems to be used only here. It would be nice if we could get
rid of it and pass "&size" to strbuf_detach(), but maybe we cannot
because of the unsigned long/size_t type mismatch. At least we could
declare "newsize" in this scope though.

> +       }
> +
> +       wrote = write_in_full(fd, new_blob, size);
> +       free(new_blob);
> +       if (wrote < 0)
> +               return error("unable to write file %s", path);
> +
> +       return 0;
> +}
> +

[...]

> +static void write_pc_item(struct parallel_checkout_item *pc_item,
> +                         struct checkout *state)
> +{
> +       unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;

It looks like we are using similar code in different places, so it's
probably ok, but maybe it would be nice one day to use a macro or an
inline function for this.

> +       int fd = -1, fstat_done = 0;
> +       struct strbuf path = STRBUF_INIT;
> +       const char *dir_sep;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/5] parallel-checkout: make it truly parallel
  2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
@ 2021-03-31  4:32   ` Christian Couder
  2021-04-02 14:42     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Couder @ 2021-03-31  4:32 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:

> diff --git a/.gitignore b/.gitignore
> index 3dcdb6bb5a..26f8ddfc55 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -33,6 +33,7 @@
>  /git-check-mailmap
>  /git-check-ref-format
>  /git-checkout
> +/git-checkout--helper

I wonder if "checkout--worker" would be better than "checkout--helper".

>  /git-checkout-index
>  /git-cherry
>  /git-cherry-pick

[...]

> +#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
> +{ \
> +       if (got != exp) \
> +               BUG("corrupted result from checkout worker (got %dB, exp %dB)", \

Maybe precompilers are smart enough to not replace the "got" and "exp"
in the above string, but it might be a bit confusing for readers.
Anway I wonder if this macro could just be a regular (possibly inline)
function.

> +                   got, exp); \
> +} while(0)

> +static void parse_and_save_result(const char *line, int len,
> +                                 struct pc_worker *worker)
> +{
> +       struct pc_item_result *res;
> +       struct parallel_checkout_item *pc_item;
> +       struct stat *st = NULL;
> +
> +       if (len < PC_ITEM_RESULT_BASE_SIZE)
> +               BUG("too short result from checkout worker (got %dB, exp %dB)",
> +                   len, (int)PC_ITEM_RESULT_BASE_SIZE);
> +
> +       res = (struct pc_item_result *)line;
> +
> +       /*
> +        * Worker should send either the full result struct on success, or
> +        * just the base (i.e. no stat data), otherwise.
> +        */
> +       if (res->status == PC_ITEM_WRITTEN) {
> +               ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
> +               st = &res->st;
> +       } else {
> +               ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
> +       }
> +
> +       if (!worker->nr_items_to_complete || res->id != worker->next_item_to_complete)

Nit: maybe it could be useful to distinguish between these 2 potential
bugs and have a specific BUG() for each one.

> +               BUG("checkout worker sent unexpected item id");

> +       worker->next_item_to_complete++;
> +       worker->nr_items_to_complete--;
> +
> +       pc_item = &parallel_checkout.items[res->id];
> +       pc_item->status = res->status;
> +       if (st)
> +               pc_item->st = *st;
> +}
> +
> +

One blank line is enough.

> +static void gather_results_from_workers(struct pc_worker *workers,
> +                                       int num_workers)
> +{
> +       int i, active_workers = num_workers;
> +       struct pollfd *pfds;
> +
> +       CALLOC_ARRAY(pfds, num_workers);
> +       for (i = 0; i < num_workers; i++) {
> +               pfds[i].fd = workers[i].cp.out;
> +               pfds[i].events = POLLIN;
> +       }
> +
> +       while (active_workers) {
> +               int nr = poll(pfds, num_workers, -1);
> +
> +               if (nr < 0) {
> +                       if (errno == EINTR)
> +                               continue;
> +                       die_errno("failed to poll checkout workers");
> +               }
> +
> +               for (i = 0; i < num_workers && nr > 0; i++) {

Is it possible that nr is 0? If that happens, it looks like we would
be in an infinite `while (active_workers) { ... }` loop.

Actually in poll(2) there is: "A value of 0 indicates that the call
timed out and no file descriptors were ready." So it seems that it
could, at least theorically, happen.

> +                       struct pc_worker *worker = &workers[i];
> +                       struct pollfd *pfd = &pfds[i];
> +
> +                       if (!pfd->revents)
> +                               continue;
> +
> +                       if (pfd->revents & POLLIN) {
> +                               int len;
> +                               const char *line = packet_read_line(pfd->fd, &len);
> +
> +                               if (!line) {
> +                                       pfd->fd = -1;
> +                                       active_workers--;
> +                               } else {
> +                                       parse_and_save_result(line, len, worker);
> +                               }
> +                       } else if (pfd->revents & POLLHUP) {
> +                               pfd->fd = -1;
> +                               active_workers--;
> +                       } else if (pfd->revents & (POLLNVAL | POLLERR)) {
> +                               die(_("error polling from checkout worker"));
> +                       }
> +
> +                       nr--;
> +               }
> +       }
> +
> +       free(pfds);
> +}

[...]

> +enum pc_item_status {
> +       PC_ITEM_PENDING = 0,
> +       PC_ITEM_WRITTEN,
> +       /*
> +        * The entry could not be written because there was another file
> +        * already present in its path or leading directories. Since
> +        * checkout_entry_ca() removes such files from the working tree before
> +        * enqueueing the entry for parallel checkout, it means that there was
> +        * a path collision among the entries being written.
> +        */
> +       PC_ITEM_COLLIDED,
> +       PC_ITEM_FAILED,
> +};
> +
> +struct parallel_checkout_item {
> +       /*
> +        * In main process ce points to a istate->cache[] entry. Thus, it's not
> +        * owned by us. In workers they own the memory, which *must be* released.
> +        */
> +       struct cache_entry *ce;
> +       struct conv_attrs ca;
> +       size_t id; /* position in parallel_checkout.items[] of main process */
> +
> +       /* Output fields, sent from workers. */
> +       enum pc_item_status status;
> +       struct stat st;
> +};

Maybe the previous patch could have declared both 'enum
pc_item_status' and 'struct parallel_checkout_item' here, in
parallel-checkout.h, so that this patch wouldn't need to move them
here.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/5] parallel-checkout: add configuration options
  2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
@ 2021-03-31  4:33   ` Christian Couder
  2021-04-02 14:45     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Couder @ 2021-03-31  4:33 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:

> The above benchmarks show that parallel checkout is most effective on
> repositories located on an SSD or over a distributed file system. For
> local file systems on spinning disks, and/or older machines, the
> parallelism does not always bring a good performance. For this reason,
> the default value for checkout.workers is one, a.k.a. sequential
> checkout.

I wonder how many people are still using HDD, and how many will still
use them in a few years.

I think having 1 as the default value for checkout.workers might be
good for a while for backward compatibility and stability, while
people who are interested can test parallel checkout on different
environments. But we might want, in a few releases, after some bugs,
if any, have been fixed, to use a default, maybe 10, that will provide
significant speedup for most people, and will justify the added
complexity, especially as your numbers still show a small speedup for
HDD when using 10.

> To decide the default value for checkout.thresholdForParallelism,
> another benchmark was executed in the "Local SSD" setup, where parallel
> checkout showed to be beneficial. This time, we compared the runtime of
> a `git checkout -f`, with and without parallelism, after randomly
> removing an increasing number of files from the Linux working tree. The
> "sequential fallback" column bellow corresponds to the executions where

s/bellow/below/

> checkout.workers was 10 but checkout.thresholdForParallelism was equal
> to the number of to-be-updated files plus one (so that we end up writing
> sequentially). Each test case was sampled 15 times, and each sample had
> a randomly different set of files removed. Here are the results:
>
>              sequential fallback   10 workers           speedup
> 10   files    772.3 ms ± 12.6 ms   769.0 ms ± 13.6 ms   1.00 ± 0.02
> 20   files    780.5 ms ± 15.8 ms   775.2 ms ±  9.2 ms   1.01 ± 0.02
> 50   files    806.2 ms ± 13.8 ms   767.4 ms ±  8.5 ms   1.05 ± 0.02
> 100  files    833.7 ms ± 21.4 ms   750.5 ms ± 16.8 ms   1.11 ± 0.04
> 200  files    897.6 ms ± 30.9 ms   730.5 ms ± 14.7 ms   1.23 ± 0.05
> 500  files   1035.4 ms ± 48.0 ms   677.1 ms ± 22.3 ms   1.53 ± 0.09
> 1000 files   1244.6 ms ± 35.6 ms   654.0 ms ± 38.3 ms   1.90 ± 0.12
> 2000 files   1488.8 ms ± 53.4 ms   658.8 ms ± 23.8 ms   2.26 ± 0.12
>
> From the above numbers, 100 files seems to be a reasonable default value
> for the threshold setting.
>
> Note: Up to 1000 files, we observe a drop in the execution time of the
> parallel code with an increase in the number of files. This is a rather
> odd behavior, but it was observed in multiple repetitions. Above 1000
> files, the execution time increases according to the number of files, as
> one would expect.
>
> About the test environments: Local SSD tests were executed on an
> i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
> HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
> with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
> Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
> instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
> instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
> the linux repository was removed (or checked out back to its previous
> state), and `sync && sysctl vm.drop_caches=3` was executed.

Thanks for all these tests and details!

> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>

[...]

> @@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
>         return parallel_checkout.status;
>  }
>
> +#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100

Using a "static const int" might be a bit better.

> +void get_parallel_checkout_configs(int *num_workers, int *threshold)
> +{
> +       if (git_config_get_int("checkout.workers", num_workers))
> +               *num_workers = 1;

I think it's better when an important default value like this "1" is
made more visible using a "static const int" or a "#define".

> +       else if (*num_workers < 1)
> +               *num_workers = online_cpus();
> +
> +       if (git_config_get_int("checkout.thresholdForParallelism", threshold))
> +               *threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
> +}
> +
>  void init_parallel_checkout(void)
>  {
>         if (parallel_checkout.status != PC_UNINITIALIZED)
> @@ -554,11 +569,9 @@ static void write_items_sequentially(struct checkout *state)
>                 write_pc_item(&parallel_checkout.items[i], state);
>  }
>
> -#define DEFAULT_NUM_WORKERS 2

As I say above, it doesn't look like a good idea to have removed this.

> -int run_parallel_checkout(struct checkout *state)
> +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
>  {
> -       int ret, num_workers = DEFAULT_NUM_WORKERS;
> +       int ret;
>
>         if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
>                 BUG("cannot run parallel checkout: uninitialized or already running");
> @@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state)
>         if (parallel_checkout.nr < num_workers)
>                 num_workers = parallel_checkout.nr;
>
> -       if (num_workers <= 1) {
> +       if (num_workers <= 1 || parallel_checkout.nr < threshold) {

Here we check the number of workers...

>                 write_items_sequentially(state);
>         } else {
>                 struct pc_worker *workers = setup_workers(state, num_workers);

[...]

> @@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
>                 }
>         }
>         stop_progress(&progress);
> -       errs |= run_parallel_checkout(&state);
> +       if (pc_workers > 1)

...but the number of workers was already checked here.


> +               errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
>         errs |= finish_delayed_checkout(&state, NULL);
>         git_attr_set_direction(GIT_ATTR_CHECKIN);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/5] parallel-checkout: add design documentation
  2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
@ 2021-03-31  5:36   ` Christian Couder
  0 siblings, 0 replies; 40+ messages in thread
From: Christian Couder @ 2021-03-31  5:36 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:

> +For the purposes of discussion here, the current sequential
> +implementation of Step 3 has 3 layers:

You refer to these layers as "steps" below, so you might want to use
"sub-steps" instead of "layers".

> +* Step 3a: `unpack-trees.c:check_updates()` contains a series of
> +  sequential loops iterating over the `cache_entry`'s array. The main
> +  loop in this function calls the next layer for each of the

Not sure what "layer" means here. Does it mean Step 3b below? In this
case I would suggest using "Step 3b" instead of "layer".

> +  to-be-updated entries.
> +
> +* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
> +  for file conflicts, collisions, and unsaved changes. It removes files
> +  and create leading directories as necessary. It calls the next layer

s/create/creates/

I guess the "next layer" is Step 3c below.

> +  for each entry to be written.
> +
> +* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
> +  it if necessary, creates the file in the working tree, writes the
> +  smudged contents, calls `fstat()` or `lstat()`, and updates the
> +  associated `cache_entry` struct with the stat information gathered.
> +
> +It wouldn't be safe to perform Step 3b in parallel, as there could be
> +race conditions between file creations and removals. Instead, the
> +parallel checkout framework lets the sequential code handle Step 3b,
> +and use parallel workers to replace the sequential
> +`entry.c:write_entry()` calls from Step 3c.

Ok.

> +Rejected Multi-Threaded Solution
> +--------------------------------
> +
> +The most "straightforward" implementation would be to spread the set of
> +to-be-updated cache entries across multiple threads. But due to the
> +thread-unsafe functions in the ODB code, we would have to use locks to
> +coordinate the parallel operation. An early prototype of this solution
> +showed that the multi-threaded checkout would bring performance
> +improvements over the sequential code, but there was still too much lock
> +contention. A `perf` profiling indicated that around 20% of the runtime
> +during a local Linux clone (on an SSD) was spent in locking functions.
> +For this reason this approach was rejected in favor of using multiple
> +child processes, which led to a better performance.

Nice explanation.

> +Multi-Process Solution
> +----------------------
> +
> +Parallel checkout alters the aforementioned Step 3 to use multiple
> +`checkout--helper` background processes to distribute the work. The
> +long-running worker processes are controlled by the foreground Git
> +command using the existing run-command API.
> +
> +Overview
> +~~~~~~~~
> +
> +Step 3b is only slightly altered; for each entry to be checked out, the
> +main process:

Maybe: s/main process:/main process performs the following steps:/

If you apply this suggestion, you may also want the following below:

s/M1: Checks/M1: Check/
s/and decides/and decide/
s/M2: Creates/M2: Create/
...

> +* M1: Checks whether there is any untracked or unclean file in the
> +  working tree which would be overwritten by this entry, and decides
> +  whether to proceed (removing the file(s)) or not.
> +
> +* M2: Creates the leading directories.
> +
> +* M3: Loads the conversion attributes for the entry's path.
> +
> +* M4: Checks, based on the entry's type and conversion attributes,
> +  whether the entry is eligible for parallel checkout (more on this
> +  later). If it is eligible, enqueues the entry and the loaded
> +  attributes to later write the entry in parallel. If not, writes the
> +  entry right away, using the default sequential code.
> +
> +Note: we save the conversion attributes associated with each entry
> +because the workers don't have access to the main process' index state,
> +so they can't load the attributes by themselves (and the attributes are
> +needed to properly smudge the entry). Additionally, this has a positive
> +impact on performance as (1) we don't need to load the attributes twice
> +and (2) the attributes machinery is optimized to handle paths in
> +sequential order.

Nice!

> +After all entries have passed through the above steps, the main process
> +checks if the number of enqueued entries is sufficient to spread among
> +the workers. If not, it just writes them sequentially. Otherwise, it
> +spawns the workers and distributes the queued entries uniformly in
> +continuous chunks. This aims to minimize the chances of two workers
> +writing to the same directory simultaneously, which could increase lock
> +contention in the kernel.
> +
> +Then, for each assigned item, each worker:
> +
> +* W1: Checks if there is any non-directory file in the leading part of
> +  the entry's path or if there already exists a file at the entry' path.
> +  If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
> +  this later).
> +
> +* W2: Creates the file (with O_CREAT and O_EXCL).
> +
> +* W3: Loads the blob into memory (inflating and delta reconstructing
> +  it).
> +
> +* W4: Filters the blob.

Not sure what "Filters" means here. Is this related to the smudge filter?

> +* W5: Writes the result to the file descriptor opened at W2.
> +
> +* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
> +  the result back to the main process, together with the end status of
> +  the operation and the item's identification number.
> +
> +Note that steps W3 to W5 might actually be performed together, using the
> +streaming interface.

Not sure what "performed together" means here. Does it mean by a
single function or set of functions?

> +Also note that the workers *never* remove any files. As mentioned

Maybe: s/any files/any file/

> +earlier, it is the responsibility of the main process to remove any
> +files that block the checkout operation (or abort it). This is crucial

Maybe: s/files/file/ and s/block/blocks/ and s/abort/aborts/

> +to avoid race conditions and also to properly detect path collisions at
> +Step W1.
> +
> +After the workers finish writing the items and sending back the required
> +information, the main process handles the results in two steps:
> +
> +- First, it updates the in-memory index with the `lstat()` information
> +  sent by the workers. (This must be done first as this information
> +  might me required in the following step.)
> +
> +- Then it writes the items which collided on disk (i.e. items marked
> +  with `PC_ITEM_COLLIDED`). More on this below.
> +
> +Path Collisions
> +---------------
> +
> +Path collisions happen when two different paths correspond to the same
> +entry in the file system. E.g. the paths 'a' and 'A' would collide in a
> +case-insensitive file system.
> +
> +The sequential checkout deals with collisions in the same way that it
> +deals with files that were already present in the working tree before
> +checkout. Basically, it checks if the path that it wants to write
> +already exists on disk, makes sure the existing file doesn't have
> +unsaved data, and then overwrite it. (To be more pedantic: it deletes

s/overwrite/overwrites/

> +the existing file and creates the new one.) So, if there are multiple
> +colliding files to be checked out, the sequential code will write each
> +one of them but only the last will actually survive on disk.
> +
> +Parallel checkout aims to reproduce the same behavior. However, we
> +cannot let the workers racily write to the same file on disk. Instead,
> +the workers detect when the entry that they want to check out would
> +collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
> +Later, the main process can sequentially feed these entries back to
> +`checkout_entry()` without the risk of race conditions. On clone, this
> +also has the effect of marking the colliding entries to later emit a
> +warning for the user, like the classic sequential checkout does.
> +
> +The workers are able to detect both collisions among the entries being
> +concurrently written and collisions among parallel-eligible and
> +ineligible entries. The general idea for collision detection is quite
> +straightforward: for each parallel-eligible entry, the main process must
> +remove all files that prevent this entry from being written (before
> +enqueueing it). This includes any non-directory file in the leading path
> +of the entry. Later, when a worker gets assigned the entry, it looks
> +again for the non-directories files and for an already existent file at

Maybe: s/existent/existing/

> +the entry's path. If any of these checks finds something, the worker
> +knows that there was a path collision.
> +
> +Because parallel checkout can distinguish path collisions from the case
> +where the file was already present in the working tree before checkout,
> +we could alternatively choose to skip the checkout of colliding entries.
> +However, each entry that doesn't get written would have NULL `lstat()`
> +fields on the index. This could cause performance penalties for
> +subsequent commands that need to refresh the index, as they would have
> +to go to the file system to see if the entry is dirty. Thus, if we have
> +N entries in a colliding group and we decide to write and `lstat()` only
> +one of them, every subsequent `git-status` will have to read, convert,
> +and hash the written file N - 1 times. By checking out all colliding
> +entries (like the sequential code does), we only pay the overhead once,
> +during checkout.
> +
> +Eligible Entries for Parallel Checkout
> +--------------------------------------
> +
> +As previously mentioned, not all entries passed to `checkout_entry()`
> +will be considered eligible for parallel checkout. More specifically, we
> +exclude:
> +
> +- Symbolic links; to avoid race conditions that, in combination with
> +  path collisions, could cause workers to write files at the wrong
> +  place. For example, if we were to concurrently check out a symlink
> +  'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
> +  we could potentially end up writing the file 'A/f' at 'a/f', due to a
> +  race condition.
> +
> +- Regular files that require external filters (either "one shot" filters
> +  or long-running process filters). These filters are black-boxes to Git
> +  and may have their own internal locking or non-concurrent assumptions.
> +  So it might not be safe to run multiple instances in parallel.
> ++
> +Besides, long-running filters may use the delayed checkout feature to
> +postpone the return of some filtered blobs. The delayed checkout queue
> +and the parallel checkout queue are not compatible and should remain
> +separated.

Are files that require some other internal filters eligible though?

> +Ineligible entries are checked out by the classic sequential codepath
> +*before* spawning workers.
> +
> +Note: submodules's files are also eligible for parallel checkout (as
> +long as they don't fall into the two excluding categories mentioned
> +above). But since each submodule is checked out in its own child
> +process, we don't mix the superproject's and the submodules' files in
> +the same parallel checkout process or queue.

Ok.

> +The API
> +-------
> +
> +The parallel checkout API was designed with the goal to minimize changes
> +to the current users of the checkout machinery. This means that they
> +don't have to call a different function for sequential or parallel
> +checkout. As already mentioned, `checkout_entry()` will automatically
> +insert the given entry in the parallel checkout queue when this feature
> +is enabled and the entry is eligible; otherwise, it will just write the
> +entry right away, using the sequential code. In general, callers of the
> +parallel checkout API should look similar to this:
> +
> +----------------------------------------------
> +int pc_workers, pc_threshold, err = 0;
> +struct checkout state;
> +
> +get_parallel_checkout_configs(&pc_workers, &pc_threshold);
> +
> +/*
> + * This check is not strictly required, but it
> + * should save some time in sequential mode.
> + */

It might be nice if this comment was also in front of the real code.

> +if (pc_workers > 1)
> +       init_parallel_checkout();
> +
> +for (each cache_entry ce to-be-updated)
> +       err |= checkout_entry(ce, &state, NULL, NULL);
> +
> +err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/5] Parallel Checkout (part 2)
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
                   ` (5 preceding siblings ...)
  2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
@ 2021-03-31  5:42 ` Christian Couder
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
  7 siblings, 0 replies; 40+ messages in thread
From: Christian Couder @ 2021-03-31  5:42 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This is the next step in the parallel checkout implementation. An
> overview of the complete series can be seen at [1].
>
> The last patch in this series adds a design doc, so it may help to
> review it first. Also, there is no need to have any familiarity with
> part-1, as this part doesn't have any semantic dependency with that.
>
> This series is based on the merge of 'mt/parallel-checkout-part-1' and
> 'master', so that it can use the "brew cast" fix and the latest security
> fix (both from master), to run the tests. (The merge is textually
> clean, but it needs a small semantic fix: the '#include "entry.h"'
> addition in builtin/stash.c).

I took a look and left a number of comments. Otherwise it looks good overall!

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/5] unpack-trees: add basic support for parallel checkout
  2021-03-31  4:22   ` Christian Couder
@ 2021-04-02 14:39     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-02 14:39 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 31, 2021 at 1:22 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > +struct parallel_checkout_item {
> > +       /* pointer to a istate->cache[] entry. Not owned by us. */
> > +       struct cache_entry *ce;
> > +       struct conv_attrs ca;
> > +       struct stat st;
> > +       enum pc_item_status status;
> > +};
> > +
> > +struct parallel_checkout {
> > +       enum pc_status status;
> > +       struct parallel_checkout_item *items;
> > +       size_t nr, alloc;
> > +};
>
> It looks like this struct parallel_checkout contains what is called
> the parallel checkout queue in many places. It would be a bit better
> if a name or at least a comment could make that clear.

Right, will do.

> > +       case CA_CLASS_INCORE_PROCESS:
> > +               /*
> > +                * The parallel queue and the delayed queue are not compatible,
> > +                * so they must be kept completely separated. And we can't tell
> > +                * if a long-running process will delay its response without
> > +                * actually asking it to perform the filtering. Therefore, this
> > +                * type of filter is not allowed in parallel checkout.
> > +                *
> > +                * Furthermore, there should only be one instance of the
> > +                * long-running process filter as we don't know how it is
> > +                * managing its own concurrency. So, spreading the entries that
> > +                * requisite such a filter among the parallel workers would
> > +                * require a lot more inter-process communication. We would
> > +                * probably have to designate a single process to interact with
> > +                * the filter and send all the necessary data to it, for each
> > +                * entry.
> > +                */
> > +               return 0;
>
> So it looks like we don't process entries that need filtering. It's a
> bit disappointing as the commit message says:
>
> "This new interface allows us to enqueue some of the entries being
> checked out to later uncompress, smudge, and write them in parallel."
>
> So it seems to say that entries could be smudged in parallel. Or maybe
> I am missing something?

Good point, this part of the commit message is indeed misleading. Only
internal filters, like re-encoding and end-of-line conversions, can be
performed in parallel. I'll rephrase that sentence, thanks!

> > +static int handle_results(struct checkout *state)
[...]
> > +
> > +       if (have_pending)
> > +               error(_("parallel checkout finished with pending entries"));
>
> Is this an error that can actually happen? Or is it a bug? If this can
> actually happen what is the state of the working directory [...] ?

This could happen both due to a bug or due to an actual error, e.g. if
one of the workers die()s or gets killed before finishing its work
queue. In these cases, the files associated with the unfinished items
will be missing from the working tree, and their index entries will
have null stat() information.

> [...] and what
> can the user do? Start another checkout with an option to do it
> sequentially?

Hmm, it depends on what caused the error. For example, if the user
accidentally killed one of the workers, starting another checkout
(either parallel or sequential) should be able to create the missing
files. But that might not be the case if the error was caused by a
fatal condition in one of the workers which led it to die() (like a
packet write failure or corrupted object).

Nevertheless, I think it might be interesting to rephrase the error
message here to be more explicit about the outcome state. Perhaps even
mention how many entries will be missing.

> [...]
>
> > +static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
> > +                              const char *path)
> > +{
> > +       int ret;
> > +       struct stream_filter *filter;
> > +       struct strbuf buf = STRBUF_INIT;
> > +       char *new_blob;
> > +       unsigned long size;
>
> I thought that we shouldn't use unsigned long anymore for sizes, but
> unfortunately it looks like that's what read_blob_entry() expects.
>
> > +       size_t newsize = 0;
>
> I don't think we need to initialize newsize. Also we could perhaps
> declare it closer to where it is used below.
>
> > +       ssize_t wrote;
> > +
> > +       /* Sanity check */
> > +       assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
> > +
> > +       filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
> > +       if (filter) {
> > +               if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
> > +                       /* On error, reset fd to try writing without streaming */
> > +                       if (reset_fd(fd, path))
> > +                               return -1;
> > +               } else {
> > +                       return 0;
> > +               }
> > +       }
> > +
> > +       new_blob = read_blob_entry(pc_item->ce, &size);
> > +       if (!new_blob)
> > +               return error("unable to read sha1 file of %s (%s)", path,
>
> With the support of sha256, I guess we should avoid talking about
> sha1. Maybe: s/sha1/object/ Or maybe we could just say that we
> couldn't read the object, as read_blob_entry() would perhaps read it
> from a pack-file and not from a loose file?

Good point, thanks!

> > +                            oid_to_hex(&pc_item->ce->oid));
> > +
> > +       /*
> > +        * checkout metadata is used to give context for external process
> > +        * filters. Files requiring such filters are not eligible for parallel
> > +        * checkout, so pass NULL.
> > +        */
> > +       ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
> > +                                        new_blob, size, &buf, NULL);
> > +
> > +       if (ret) {
> > +               free(new_blob);
> > +               new_blob = strbuf_detach(&buf, &newsize);
> > +               size = newsize;
>
> "newsize" seems to be used only here. It would be nice if we could get
> rid of it and pass "&size" to strbuf_detach(), but maybe we cannot
> because of the unsigned long/size_t type mismatch. At least we could
> declare "newsize" in this scope though.

Sure, I'll move "newsize" down to this block and remove the
initialization, thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/5] parallel-checkout: make it truly parallel
  2021-03-31  4:32   ` Christian Couder
@ 2021-04-02 14:42     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-02 14:42 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 31, 2021 at 1:32 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > diff --git a/.gitignore b/.gitignore
> > index 3dcdb6bb5a..26f8ddfc55 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -33,6 +33,7 @@
> >  /git-check-mailmap
> >  /git-check-ref-format
> >  /git-checkout
> > +/git-checkout--helper
>
> I wonder if "checkout--worker" would be better than "checkout--helper".

Yeah, good idea, I'll change that.

> >  /git-checkout-index
> >  /git-cherry
> >  /git-cherry-pick
>
> [...]
>
> > +#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
> > +{ \
> > +       if (got != exp) \
> > +               BUG("corrupted result from checkout worker (got %dB, exp %dB)", \
>
> Maybe precompilers are smart enough to not replace the "got" and "exp"
> in the above string, but it might be a bit confusing for readers.
> Anway I wonder if this macro could just be a regular (possibly inline)
> function.

Ok, I will replace this with an inline function.

> > +                   got, exp); \
> > +} while(0)
>
> > +static void parse_and_save_result(const char *line, int len,
> > +                                 struct pc_worker *worker)
> > +{
> > +       struct pc_item_result *res;
> > +       struct parallel_checkout_item *pc_item;
> > +       struct stat *st = NULL;
> > +
> > +       if (len < PC_ITEM_RESULT_BASE_SIZE)
> > +               BUG("too short result from checkout worker (got %dB, exp %dB)",
> > +                   len, (int)PC_ITEM_RESULT_BASE_SIZE);
> > +
> > +       res = (struct pc_item_result *)line;
> > +
> > +       /*
> > +        * Worker should send either the full result struct on success, or
> > +        * just the base (i.e. no stat data), otherwise.
> > +        */
> > +       if (res->status == PC_ITEM_WRITTEN) {
> > +               ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
> > +               st = &res->st;
> > +       } else {
> > +               ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
> > +       }
> > +
> > +       if (!worker->nr_items_to_complete || res->id != worker->next_item_to_complete)
>
> Nit: maybe it could be useful to distinguish between these 2 potential
> bugs and have a specific BUG() for each one.

Right, will do.

> > +static void gather_results_from_workers(struct pc_worker *workers,
> > +                                       int num_workers)
> > +{
> > +       int i, active_workers = num_workers;
> > +       struct pollfd *pfds;
> > +
> > +       CALLOC_ARRAY(pfds, num_workers);
> > +       for (i = 0; i < num_workers; i++) {
> > +               pfds[i].fd = workers[i].cp.out;
> > +               pfds[i].events = POLLIN;
> > +       }
> > +
> > +       while (active_workers) {
> > +               int nr = poll(pfds, num_workers, -1);
> > +
> > +               if (nr < 0) {
> > +                       if (errno == EINTR)
> > +                               continue;
> > +                       die_errno("failed to poll checkout workers");
> > +               }
> > +
> > +               for (i = 0; i < num_workers && nr > 0; i++) {
>
> Is it possible that nr is 0? If that happens, it looks like we would
> be in an infinite `while (active_workers) { ... }` loop.
>
> Actually in poll(2) there is: "A value of 0 indicates that the call
> timed out and no file descriptors were ready." So it seems that it
> could, at least theorically, happen.

I think a 0 return might not be possible in this case because we call
poll() with -1 as the timeout, which means "infinite timeout". So the
call should block until either an error occurs (negative return code)
or there is a file descriptor available for reading (positive return
code).

> > +enum pc_item_status {
> > +       PC_ITEM_PENDING = 0,
> > +       PC_ITEM_WRITTEN,
> > +       /*
> > +        * The entry could not be written because there was another file
> > +        * already present in its path or leading directories. Since
> > +        * checkout_entry_ca() removes such files from the working tree before
> > +        * enqueueing the entry for parallel checkout, it means that there was
> > +        * a path collision among the entries being written.
> > +        */
> > +       PC_ITEM_COLLIDED,
> > +       PC_ITEM_FAILED,
> > +};
> > +
> > +struct parallel_checkout_item {
> > +       /*
> > +        * In main process ce points to a istate->cache[] entry. Thus, it's not
> > +        * owned by us. In workers they own the memory, which *must be* released.
> > +        */
> > +       struct cache_entry *ce;
> > +       struct conv_attrs ca;
> > +       size_t id; /* position in parallel_checkout.items[] of main process */
> > +
> > +       /* Output fields, sent from workers. */
> > +       enum pc_item_status status;
> > +       struct stat st;
> > +};
>
> Maybe the previous patch could have declared both 'enum
> pc_item_status' and 'struct parallel_checkout_item' here, in
> parallel-checkout.h, so that this patch wouldn't need to move them
> here.

Yeah, while I was writing this patch I went back and forth on whether
to declare these here from the start. But because I wanted to have the
"parallel-checkout users" / "checkout--helper interface" division in
parallel-checkout.h, I thought it would be better to move the structs
to this header only after the checkout--helper was introduced.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/5] parallel-checkout: add configuration options
  2021-03-31  4:33   ` Christian Couder
@ 2021-04-02 14:45     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-02 14:45 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Jeff Hostetler

On Wed, Mar 31, 2021 at 1:33 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > The above benchmarks show that parallel checkout is most effective on
> > repositories located on an SSD or over a distributed file system. For
> > local file systems on spinning disks, and/or older machines, the
> > parallelism does not always bring a good performance. For this reason,
> > the default value for checkout.workers is one, a.k.a. sequential
> > checkout.
>
> I wonder how many people are still using HDD, and how many will still
> use them in a few years.
>
> I think having 1 as the default value for checkout.workers might be
> good for a while for backward compatibility and stability, while
> people who are interested can test parallel checkout on different
> environments. But we might want, in a few releases, after some bugs,
> if any, have been fixed, to use a default, maybe 10, that will provide
> significant speedup for most people, and will justify the added
> complexity, especially as your numbers still show a small speedup for
> HDD when using 10.

Yeah, I agree that it would be nice to have a better default in the
near future. Unfortunately, on some other HDD machines that I tested,
parallel checkout was slower than the sequential version. So I think
it may not be possible to enable parallelism by default now.

Nevertheless, I was also experimenting with some ideas to auto-detect
if parallelism is efficient in a given environment/repo and
auto-enable it if so. One interesting possibility suggested by Ævar
[1] was to implement this as a git maintenance task, which could
periodically probe the system and tune the checkout settings
appropriately.

[1]: https://lore.kernel.org/git/87y2ixpvos.fsf@evledraar.gmail.com/

> > @@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
> >         return parallel_checkout.status;
> >  }
> >
> > +#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
>
> Using a "static const int" might be a bit better.

Ok, I will change that.

> > +void get_parallel_checkout_configs(int *num_workers, int *threshold)
> > +{
> > +       if (git_config_get_int("checkout.workers", num_workers))
> > +               *num_workers = 1;
>
> I think it's better when an important default value like this "1" is
> made more visible using a "static const int" or a "#define".

Will do.

> > @@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state)
> >         if (parallel_checkout.nr < num_workers)
> >                 num_workers = parallel_checkout.nr;
> >
> > -       if (num_workers <= 1) {
> > +       if (num_workers <= 1 || parallel_checkout.nr < threshold) {
>
> Here we check the number of workers...
>
> >                 write_items_sequentially(state);
> >         } else {
> >                 struct pc_worker *workers = setup_workers(state, num_workers);
>
> [...]
>
> > @@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
> >                 }
> >         }
> >         stop_progress(&progress);
> > -       errs |= run_parallel_checkout(&state);
> > +       if (pc_workers > 1)
> > +               errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);

> ...but the number of workers was already checked here.

The re-checking at run_parallel_checkout() is important because
`num_workers` might actually become <= 1 after the above check at
check_updates(). This happens when there aren't enough enqueued
entries for 2+ workers, so we fall back to sequential checkout. It
also kind of works as a safe-mechanism for the case where the
run_parallel_checkout() caller forgot to check if the user actually
wants parallelism before calling the function.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/5] Parallel Checkout (part 2)
  2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
                   ` (6 preceding siblings ...)
  2021-03-31  5:42 ` Christian Couder
@ 2021-04-08 16:16 ` Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
                     ` (7 more replies)
  7 siblings, 8 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-08 16:16 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

This is the next step in the parallel checkout implementation. As
mt/parallel-checkout-part-1 is now on master, this round is based
directly on master.

Changes since v1:

Patch 1:
- Fixed misleading sentence in commit message which said that
  parallel-checkout can smudge entries in parallel (it can only run
  internal filters in parallel).
- On write_pc_item_to_fd(), moved `newsize` declaration down to the
  scope where it is actually used and removed the initialization. 
- Also on write_pc_item_to_fd(), rephrased error message on
  read_blob_entry() to avoid mentioning 'sha1' and 'object file', which
  might not be the case there.
- Fixed typo at handle_results(): s/it's/its/.
- Added comment to 'struct parallel_checkout' to clarify what data we
  later refer as the "parallel checkout queue".
- Quoted paths in error messages.

Patch 2:
- Renamed checkout--helper to checkout--worker.
- Used packet_read() instead of packet_line_read() in both main process
  and workers. packet_line_read() is not suitable for binary data as it
  chomps a trailing newline (which could actually be part of the
  payload).
- Added check for checkout items that cannot be sent in one single
  pkt-line. (Such items are very unlikely to appear in practice, but we
  cannot ignore them.)
- Replaced macro that checks the pc_item_result size received from
  workers with an inline function.
- At parse_and_save_result(), separated one BUG() that guards two
  different conditions into two BUG()s, to help debugging.
- Suppressed error messages for finish_command() errors that don't
  come from a signal death. Such errors are already reported by the
  worker itself, so there is no need to repeat ourselves.
- Ignored SIGPIPE when writing to workers so that we can get an EPIPE
  in the pkt-line machinery and die() with an error message instead of
  nothing.

Patch 3:
- Used `static const int` instead of #define for the default parallel
  checkout config values.

Patch 5:
- Fixed typos in design doc.
- Clarified which types of filters can be applied in parallel.
- Improved wording on the 'sequential implementation' section. In
  particular, avoided using the term "layers" to refer to the sub-steps.
- Clarified sentence about the streaming interface and how it affects
  some of the steps performed by the workers.
- Clarified that the main process only removes files with --force.

Additionally, the error() and die() messages from the previous round
were mostly not marked for translation, but there were 3 strings that
were accidentally marked. I'm not sure whether parallel-checkout
messages should be translated or not, since it's a low-level machinery
that can be used by both porcelain and plumbing commands. But because
entry.c's messages are not localized, and parallel-checkout is kind of
an extension of it, I decided to remove the three last _() occurrences
in this round.


Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 268 +++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--worker.c                    | 145 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 658 ++++++++++++++++++
 parallel-checkout.h                           | 111 +++
 unpack-trees.c                                |  19 +-
 12 files changed, 1241 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--worker.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

Range-diff against v1:
1:  a3383042e2 ! 1:  0506ac7159 unpack-trees: add basic support for parallel checkout
    @@ Commit message
         unpack-trees: add basic support for parallel checkout
     
         This new interface allows us to enqueue some of the entries being
    -    checked out to later uncompress, smudge, and write them in parallel. For
    -    now, the parallel checkout machinery is enabled by default and there is
    -    no user configuration, but run_parallel_checkout() just writes the
    -    queued entries in sequence (without spawning additional workers). The
    -    next patch will actually implement the parallelism and, later, we will
    -    make it configurable.
    +    checked out to later uncompress them, apply in-process filters, and
    +    write out the files in parallel. For now, the parallel checkout
    +    machinery is enabled by default and there is no user configuration, but
    +    run_parallel_checkout() just writes the queued entries in sequence
    +    (without spawning additional workers). The next patch will actually
    +    implement the parallelism and, later, we will make it configurable.
     
         Note that, to avoid potential data races, not all entries are eligible
         for parallel checkout. Also, paths that collide on disk (e.g.
    @@ parallel-checkout.c (new)
     +
     +struct parallel_checkout {
     +	enum pc_status status;
    -+	struct parallel_checkout_item *items;
    ++	struct parallel_checkout_item *items; /* The parallel checkout queue. */
     +	size_t nr, alloc;
     +};
     +
    @@ parallel-checkout.c (new)
     +			 *
     +			 * Instead, let's pay the overhead only once, now, and
     +			 * call checkout_entry_ca() again for this file, to
    -+			 * have it's stat() data stored in the index. This also
    ++			 * have its stat() data stored in the index. This also
     +			 * has the benefit of adding this entry and its
     +			 * colliding pair to the collision report message.
     +			 * Additionally, this overwriting behavior is consistent
    @@ parallel-checkout.c (new)
     +	}
     +
     +	if (have_pending)
    -+		error(_("parallel checkout finished with pending entries"));
    ++		error("parallel checkout finished with pending entries");
     +
     +	return ret;
     +}
    @@ parallel-checkout.c (new)
     +static int reset_fd(int fd, const char *path)
     +{
     +	if (lseek(fd, 0, SEEK_SET) != 0)
    -+		return error_errno("failed to rewind descriptor of %s", path);
    ++		return error_errno("failed to rewind descriptor of '%s'", path);
     +	if (ftruncate(fd, 0))
    -+		return error_errno("failed to truncate file %s", path);
    ++		return error_errno("failed to truncate file '%s'", path);
     +	return 0;
     +}
     +
    @@ parallel-checkout.c (new)
     +	struct strbuf buf = STRBUF_INIT;
     +	char *new_blob;
     +	unsigned long size;
    -+	size_t newsize = 0;
     +	ssize_t wrote;
     +
     +	/* Sanity check */
    @@ parallel-checkout.c (new)
     +
     +	new_blob = read_blob_entry(pc_item->ce, &size);
     +	if (!new_blob)
    -+		return error("unable to read sha1 file of %s (%s)", path,
    -+			     oid_to_hex(&pc_item->ce->oid));
    ++		return error("cannot read object %s '%s'",
    ++			     oid_to_hex(&pc_item->ce->oid), path);
     +
     +	/*
     +	 * checkout metadata is used to give context for external process
    @@ parallel-checkout.c (new)
     +					 new_blob, size, &buf, NULL);
     +
     +	if (ret) {
    ++		size_t newsize;
     +		free(new_blob);
     +		new_blob = strbuf_detach(&buf, &newsize);
     +		size = newsize;
    @@ parallel-checkout.c (new)
     +	wrote = write_in_full(fd, new_blob, size);
     +	free(new_blob);
     +	if (wrote < 0)
    -+		return error("unable to write file %s", path);
    ++		return error("unable to write file '%s'", path);
     +
     +	return 0;
     +}
    @@ parallel-checkout.c (new)
     +			 */
     +			pc_item->status = PC_ITEM_COLLIDED;
     +		} else {
    -+			error_errno("failed to open file %s", path.buf);
    ++			error_errno("failed to open file '%s'", path.buf);
     +			pc_item->status = PC_ITEM_FAILED;
     +		}
     +		goto out;
    @@ parallel-checkout.c (new)
     +	fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
     +
     +	if (close_and_clear(&fd)) {
    -+		error_errno("unable to close file %s", path.buf);
    ++		error_errno("unable to close file '%s'", path.buf);
     +		pc_item->status = PC_ITEM_FAILED;
     +		goto out;
     +	}
     +
     +	if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
    -+		error_errno("unable to stat just-written file %s",  path.buf);
    ++		error_errno("unable to stat just-written file '%s'",  path.buf);
     +		pc_item->status = PC_ITEM_FAILED;
     +		goto out;
     +	}
2:  fca797698c ! 2:  0d65b517bd parallel-checkout: make it truly parallel
    @@ .gitignore
      /git-check-mailmap
      /git-check-ref-format
      /git-checkout
    -+/git-checkout--helper
    ++/git-checkout--worker
      /git-checkout-index
      /git-cherry
      /git-cherry-pick
    @@ Makefile: BUILTIN_OBJS += builtin/check-attr.o
      BUILTIN_OBJS += builtin/check-ignore.o
      BUILTIN_OBJS += builtin/check-mailmap.o
      BUILTIN_OBJS += builtin/check-ref-format.o
    -+BUILTIN_OBJS += builtin/checkout--helper.o
    ++BUILTIN_OBJS += builtin/checkout--worker.o
      BUILTIN_OBJS += builtin/checkout-index.o
      BUILTIN_OBJS += builtin/checkout.o
      BUILTIN_OBJS += builtin/clean.o
    @@ builtin.h: int cmd_bugreport(int argc, const char **argv, const char *prefix);
      int cmd_bundle(int argc, const char **argv, const char *prefix);
      int cmd_cat_file(int argc, const char **argv, const char *prefix);
      int cmd_checkout(int argc, const char **argv, const char *prefix);
    -+int cmd_checkout__helper(int argc, const char **argv, const char *prefix);
    ++int cmd_checkout__worker(int argc, const char **argv, const char *prefix);
      int cmd_checkout_index(int argc, const char **argv, const char *prefix);
      int cmd_check_attr(int argc, const char **argv, const char *prefix);
      int cmd_check_ignore(int argc, const char **argv, const char *prefix);
     
    - ## builtin/checkout--helper.c (new) ##
    + ## builtin/checkout--worker.c (new) ##
     @@
     +#include "builtin.h"
     +#include "config.h"
    @@ builtin/checkout--helper.c (new)
     +#include "parse-options.h"
     +#include "pkt-line.h"
     +
    -+static void packet_to_pc_item(char *line, int len,
    ++static void packet_to_pc_item(const char *buffer, int len,
     +			      struct parallel_checkout_item *pc_item)
     +{
    -+	struct pc_item_fixed_portion *fixed_portion;
    -+	char *encoding, *variant;
    ++	const struct pc_item_fixed_portion *fixed_portion;
    ++	const char *variant;
    ++	char *encoding;
     +
     +	if (len < sizeof(struct pc_item_fixed_portion))
     +		BUG("checkout worker received too short item (got %dB, exp %dB)",
     +		    len, (int)sizeof(struct pc_item_fixed_portion));
     +
    -+	fixed_portion = (struct pc_item_fixed_portion *)line;
    ++	fixed_portion = (struct pc_item_fixed_portion *)buffer;
     +
     +	if (len - sizeof(struct pc_item_fixed_portion) !=
     +		fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
     +		BUG("checkout worker received corrupted item");
     +
    -+	variant = line + sizeof(struct pc_item_fixed_portion);
    ++	variant = buffer + sizeof(struct pc_item_fixed_portion);
     +
     +	/*
     +	 * Note: the main process uses zero length to communicate that the
    @@ builtin/checkout--helper.c (new)
     +	size_t i, nr = 0, alloc = 0;
     +
     +	while (1) {
    -+		int len;
    -+		char *line = packet_read_line(0, &len);
    ++		int len = packet_read(0, NULL, NULL, packet_buffer,
    ++				      sizeof(packet_buffer), 0);
     +
    -+		if (!line)
    ++		if (len < 0)
    ++			BUG("packet_read() returned negative value");
    ++		else if (!len)
     +			break;
     +
     +		ALLOC_GROW(items, nr + 1, alloc);
    -+		packet_to_pc_item(line, len, &items[nr++]);
    ++		packet_to_pc_item(packet_buffer, len, &items[nr++]);
     +	}
     +
     +	for (i = 0; i < nr; i++) {
    @@ builtin/checkout--helper.c (new)
     +	free(items);
     +}
     +
    -+static const char * const checkout_helper_usage[] = {
    -+	N_("git checkout--helper [<options>]"),
    ++static const char * const checkout_worker_usage[] = {
    ++	N_("git checkout--worker [<options>]"),
     +	NULL
     +};
     +
    -+int cmd_checkout__helper(int argc, const char **argv, const char *prefix)
    ++int cmd_checkout__worker(int argc, const char **argv, const char *prefix)
     +{
     +	struct checkout state = CHECKOUT_INIT;
    -+	struct option checkout_helper_options[] = {
    ++	struct option checkout_worker_options[] = {
     +		OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
     +			N_("when creating files, prepend <string>")),
     +		OPT_END()
     +	};
     +
     +	if (argc == 2 && !strcmp(argv[1], "-h"))
    -+		usage_with_options(checkout_helper_usage,
    -+				   checkout_helper_options);
    ++		usage_with_options(checkout_worker_usage,
    ++				   checkout_worker_options);
     +
     +	git_config(git_default_config, NULL);
    -+	argc = parse_options(argc, argv, prefix, checkout_helper_options,
    -+			     checkout_helper_usage, 0);
    ++	argc = parse_options(argc, argv, prefix, checkout_worker_options,
    ++			     checkout_worker_usage, 0);
     +	if (argc > 0)
    -+		usage_with_options(checkout_helper_usage, checkout_helper_options);
    ++		usage_with_options(checkout_worker_usage, checkout_worker_options);
     +
     +	if (state.base_dir)
     +		state.base_dir_len = strlen(state.base_dir);
    @@ git.c: static struct cmd_struct commands[] = {
      	{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
      	{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT  },
      	{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
    -+	{ "checkout--helper", cmd_checkout__helper,
    ++	{ "checkout--worker", cmd_checkout__worker,
     +		RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
      	{ "checkout-index", cmd_checkout_index,
      		RUN_SETUP | NEED_WORK_TREE},
    @@ parallel-checkout.c
      #include "parallel-checkout.h"
     +#include "pkt-line.h"
     +#include "run-command.h"
    ++#include "sigchain.h"
      #include "streaming.h"
      
     -enum pc_item_status {
    @@ parallel-checkout.c
      };
      
      struct parallel_checkout {
    +@@ parallel-checkout.c: static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
    + 					     const struct conv_attrs *ca)
    + {
    + 	enum conv_attrs_classification c;
    ++	size_t packed_item_size;
    + 
    + 	/*
    + 	 * Symlinks cannot be checked out in parallel as, in case of path
    +@@ parallel-checkout.c: static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
    + 	if (!S_ISREG(ce->ce_mode))
    + 		return 0;
    + 
    ++	packed_item_size = sizeof(struct pc_item_fixed_portion) + ce->ce_namelen +
    ++		(ca->working_tree_encoding ? strlen(ca->working_tree_encoding) : 0);
    ++
    ++	/*
    ++	 * The amount of data we send to the workers per checkout item is
    ++	 * typically small (75~300B). So unless we find an insanely huge path
    ++	 * of 64KB, we should never reach the 65KB limit of one pkt-line. If
    ++	 * that does happen, we let the sequential code handle the item.
    ++	 */
    ++	if (packed_item_size > LARGE_PACKET_DATA_MAX)
    ++		return 0;
    ++
    + 	c = classify_conv_attrs(ca);
    + 	switch (c) {
    + 	case CA_CLASS_INCORE:
     @@ parallel-checkout.c: int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
      	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
      		   parallel_checkout.alloc);
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +	size_t working_tree_encoding_len = working_tree_encoding ?
     +					   strlen(working_tree_encoding) : 0;
     +
    ++	/*
    ++	 * Any changes in the calculation of the message size must also be made
    ++	 * in is_eligible_for_parallel_checkout().
    ++	 */
     +	len_data = sizeof(struct pc_item_fixed_portion) + name_len +
     +		   working_tree_encoding_len;
     +
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +static void send_batch(int fd, size_t start, size_t nr)
     +{
     +	size_t i;
    ++	sigchain_push(SIGPIPE, SIG_IGN);
     +	for (i = 0; i < nr; i++)
     +		send_one_item(fd, &parallel_checkout.items[start + i]);
     +	packet_flush(fd);
    ++	sigchain_pop(SIGPIPE);
     +}
     +
     +static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +		cp->in = -1;
     +		cp->out = -1;
     +		cp->clean_on_exit = 1;
    -+		strvec_push(&cp->args, "checkout--helper");
    ++		strvec_push(&cp->args, "checkout--worker");
     +		if (state->base_dir_len)
     +			strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
     +		if (start_command(cp))
    -+			die(_("failed to spawn checkout worker"));
    ++			die("failed to spawn checkout worker");
     +	}
     +
     +	base_batch_size = parallel_checkout.nr / num_workers;
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +	}
     +
     +	for (i = 0; i < num_workers; i++) {
    -+		if (finish_command(&workers[i].cp))
    -+			error(_("checkout worker %d finished with error"), i);
    ++		int rc = finish_command(&workers[i].cp);
    ++		if (rc > 128) {
    ++			/*
    ++			 * For a normal non-zero exit, the worker should have
    ++			 * already printed something useful to stderr. But a
    ++			 * death by signal should be mentioned to the user.
    ++			 */
    ++			error("checkout worker %d died of signal %d", i, rc - 128);
    ++		}
     +	}
     +
     +	free(workers);
     +}
     +
    -+#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
    -+{ \
    -+	if (got != exp) \
    -+		BUG("corrupted result from checkout worker (got %dB, exp %dB)", \
    -+		    got, exp); \
    -+} while(0)
    ++static inline void assert_pc_item_result_size(int got, int exp)
    ++{
    ++	if (got != exp)
    ++		BUG("wrong result size from checkout worker (got %dB, exp %dB)",
    ++		    got, exp);
    ++}
     +
    -+static void parse_and_save_result(const char *line, int len,
    ++static void parse_and_save_result(const char *buffer, int len,
     +				  struct pc_worker *worker)
     +{
     +	struct pc_item_result *res;
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +	struct stat *st = NULL;
     +
     +	if (len < PC_ITEM_RESULT_BASE_SIZE)
    -+		BUG("too short result from checkout worker (got %dB, exp %dB)",
    ++		BUG("too short result from checkout worker (got %dB, exp >=%dB)",
     +		    len, (int)PC_ITEM_RESULT_BASE_SIZE);
     +
    -+	res = (struct pc_item_result *)line;
    ++	res = (struct pc_item_result *)buffer;
     +
     +	/*
     +	 * Worker should send either the full result struct on success, or
     +	 * just the base (i.e. no stat data), otherwise.
     +	 */
     +	if (res->status == PC_ITEM_WRITTEN) {
    -+		ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
    ++		assert_pc_item_result_size(len, (int)sizeof(struct pc_item_result));
     +		st = &res->st;
     +	} else {
    -+		ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
    ++		assert_pc_item_result_size(len, (int)PC_ITEM_RESULT_BASE_SIZE);
     +	}
     +
    -+	if (!worker->nr_items_to_complete || res->id != worker->next_item_to_complete)
    -+		BUG("checkout worker sent unexpected item id");
    ++	if (!worker->nr_items_to_complete)
    ++		BUG("received result from supposedly finished checkout worker");
    ++	if (res->id != worker->next_item_to_complete)
    ++		BUG("unexpected item id from checkout worker (got %"PRIuMAX", exp %"PRIuMAX")",
    ++		    (uintmax_t)res->id, (uintmax_t)worker->next_item_to_complete);
     +
     +	worker->next_item_to_complete++;
     +	worker->nr_items_to_complete--;
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +		pc_item->st = *st;
     +}
     +
    -+
     +static void gather_results_from_workers(struct pc_worker *workers,
     +					int num_workers)
     +{
    @@ parallel-checkout.c: static void write_pc_item(struct parallel_checkout_item *pc
     +				continue;
     +
     +			if (pfd->revents & POLLIN) {
    -+				int len;
    -+				const char *line = packet_read_line(pfd->fd, &len);
    ++				int len = packet_read(pfd->fd, NULL, NULL,
    ++						      packet_buffer,
    ++						      sizeof(packet_buffer), 0);
     +
    -+				if (!line) {
    ++				if (len < 0) {
    ++					BUG("packet_read() returned negative value");
    ++				} else if (!len) {
     +					pfd->fd = -1;
     +					active_workers--;
     +				} else {
    -+					parse_and_save_result(line, len, worker);
    ++					parse_and_save_result(packet_buffer,
    ++							      len, worker);
     +				}
     +			} else if (pfd->revents & POLLHUP) {
     +				pfd->fd = -1;
     +				active_workers--;
     +			} else if (pfd->revents & (POLLNVAL | POLLERR)) {
    -+				die(_("error polling from checkout worker"));
    ++				die("error polling from checkout worker");
     +			}
     +
     +			nr--;
    @@ parallel-checkout.c: static void write_items_sequentially(struct checkout *state
      		write_pc_item(&parallel_checkout.items[i], state);
      }
      
    -+#define DEFAULT_NUM_WORKERS 2
    ++static const int DEFAULT_NUM_WORKERS = 2;
     +
      int run_parallel_checkout(struct checkout *state)
      {
    @@ parallel-checkout.h: int enqueue_checkout(struct cache_entry *ce, struct conv_at
      int run_parallel_checkout(struct checkout *state);
      
     +/****************************************************************
    -+ * Interface with checkout--helper
    ++ * Interface with checkout--worker
     + ****************************************************************/
     +
     +enum pc_item_status {
3:  8c83e92445 ! 3:  6ea057f9c5 parallel-checkout: add configuration options
    @@ Commit message
         checkout showed to be beneficial. This time, we compared the runtime of
         a `git checkout -f`, with and without parallelism, after randomly
         removing an increasing number of files from the Linux working tree. The
    -    "sequential fallback" column bellow corresponds to the executions where
    +    "sequential fallback" column below corresponds to the executions where
         checkout.workers was 10 but checkout.thresholdForParallelism was equal
         to the number of to-be-updated files plus one (so that we end up writing
         sequentially). Each test case was sampled 15 times, and each sample had
    @@ parallel-checkout.c
      #include "parallel-checkout.h"
      #include "pkt-line.h"
      #include "run-command.h"
    + #include "sigchain.h"
      #include "streaming.h"
     +#include "thread-utils.h"
      
    @@ parallel-checkout.c: enum pc_status parallel_checkout_status(void)
      	return parallel_checkout.status;
      }
      
    -+#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100
    ++static const int DEFAULT_THRESHOLD_FOR_PARALLELISM = 100;
    ++static const int DEFAULT_NUM_WORKERS = 1;
     +
     +void get_parallel_checkout_configs(int *num_workers, int *threshold)
     +{
     +	if (git_config_get_int("checkout.workers", num_workers))
    -+		*num_workers = 1;
    ++		*num_workers = DEFAULT_NUM_WORKERS;
     +	else if (*num_workers < 1)
     +		*num_workers = online_cpus();
     +
    @@ parallel-checkout.c: static void write_items_sequentially(struct checkout *state
      		write_pc_item(&parallel_checkout.items[i], state);
      }
      
    --#define DEFAULT_NUM_WORKERS 2
    +-static const int DEFAULT_NUM_WORKERS = 2;
     -
     -int run_parallel_checkout(struct checkout *state)
     +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
    @@ parallel-checkout.h: void init_parallel_checkout(void);
     +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
      
      /****************************************************************
    -  * Interface with checkout--helper
    +  * Interface with checkout--worker
     
      ## unpack-trees.c ##
     @@ unpack-trees.c: static int check_updates(struct unpack_trees_options *o,
4:  f7432e15dd ! 4:  0ac4bee67e parallel-checkout: support progress displaying
    @@ parallel-checkout.c
      #include "pkt-line.h"
     +#include "progress.h"
      #include "run-command.h"
    + #include "sigchain.h"
      #include "streaming.h"
    - #include "thread-utils.h"
     @@ parallel-checkout.c: struct parallel_checkout {
      	enum pc_status status;
    - 	struct parallel_checkout_item *items;
    + 	struct parallel_checkout_item *items; /* The parallel checkout queue. */
      	size_t nr, alloc;
     +	struct progress *progress;
     +	unsigned int *progress_cnt;
    @@ parallel-checkout.c: static int handle_results(struct checkout *state)
      			break;
      		case PC_ITEM_PENDING:
      			have_pending = 1;
    -@@ parallel-checkout.c: static void parse_and_save_result(const char *line, int len,
    +@@ parallel-checkout.c: static void parse_and_save_result(const char *buffer, int len,
      	pc_item->status = res->status;
      	if (st)
      		pc_item->st = *st;
    @@ parallel-checkout.c: static void parse_and_save_result(const char *line, int len
     +		advance_progress_meter();
      }
      
    - 
    + static void gather_results_from_workers(struct pc_worker *workers,
     @@ parallel-checkout.c: static void write_items_sequentially(struct checkout *state)
      {
      	size_t i;
    @@ parallel-checkout.h: void init_parallel_checkout(void);
     +			  struct progress *progress, unsigned int *progress_cnt);
      
      /****************************************************************
    -  * Interface with checkout--helper
    +  * Interface with checkout--worker
     
      ## unpack-trees.c ##
     @@ unpack-trees.c: static int check_updates(struct unpack_trees_options *o,
5:  0592740ec1 ! 5:  087f8bdf35 parallel-checkout: add design documentation
    @@ Documentation/technical/parallel-checkout.txt (new)
     +==============================
     +
     +The "Parallel Checkout" feature attempts to use multiple processes to
    -+parallelize the work of uncompressing, smudging, and writing blobs
    -+during a checkout operation. It can be used by all checkout-related
    -+commands, such as `clone`, `checkout`, `reset`, `sparse-checkout`, and
    -+others.
    ++parallelize the work of uncompressing the blobs, applying in-core
    ++filters, and writing the resulting contents to the working tree during a
    ++checkout operation. It can be used by all checkout-related commands,
    ++such as `clone`, `checkout`, `reset`, `sparse-checkout`, and others.
     +
     +These commands share the following basic structure:
     +
    @@ Documentation/technical/parallel-checkout.txt (new)
     +-------------------------
     +
     +For the purposes of discussion here, the current sequential
    -+implementation of Step 3 has 3 layers:
    ++implementation of Step 3 is divided in 3 parts, each one implemented in
    ++its own function:
     +
     +* Step 3a: `unpack-trees.c:check_updates()` contains a series of
     +  sequential loops iterating over the `cache_entry`'s array. The main
    -+  loop in this function calls the next layer for each of the
    ++  loop in this function calls the Step 3b function for each of the
     +  to-be-updated entries.
     +
     +* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
     +  for file conflicts, collisions, and unsaved changes. It removes files
    -+  and create leading directories as necessary. It calls the next layer
    -+  for each entry to be written.
    ++  and creates leading directories as necessary. It calls the Step 3c
    ++  function for each entry to be written.
     +
     +* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
     +  it if necessary, creates the file in the working tree, writes the
    @@ Documentation/technical/parallel-checkout.txt (new)
     +----------------------
     +
     +Parallel checkout alters the aforementioned Step 3 to use multiple
    -+`checkout--helper` background processes to distribute the work. The
    ++`checkout--worker` background processes to distribute the work. The
     +long-running worker processes are controlled by the foreground Git
     +command using the existing run-command API.
     +
    @@ Documentation/technical/parallel-checkout.txt (new)
     +~~~~~~~~
     +
     +Step 3b is only slightly altered; for each entry to be checked out, the
    -+main process:
    ++main process performs the following steps:
     +
    -+* M1: Checks whether there is any untracked or unclean file in the
    -+  working tree which would be overwritten by this entry, and decides
    ++* M1: Check whether there is any untracked or unclean file in the
    ++  working tree which would be overwritten by this entry, and decide
     +  whether to proceed (removing the file(s)) or not.
     +
    -+* M2: Creates the leading directories.
    ++* M2: Create the leading directories.
     +
    -+* M3: Loads the conversion attributes for the entry's path.
    ++* M3: Load the conversion attributes for the entry's path.
     +
    -+* M4: Checks, based on the entry's type and conversion attributes,
    ++* M4: Check, based on the entry's type and conversion attributes,
     +  whether the entry is eligible for parallel checkout (more on this
    -+  later). If it is eligible, enqueues the entry and the loaded
    -+  attributes to later write the entry in parallel. If not, writes the
    ++  later). If it is eligible, enqueue the entry and the loaded
    ++  attributes to later write the entry in parallel. If not, write the
     +  entry right away, using the default sequential code.
     +
     +Note: we save the conversion attributes associated with each entry
    @@ Documentation/technical/parallel-checkout.txt (new)
     +* W3: Loads the blob into memory (inflating and delta reconstructing
     +  it).
     +
    -+* W4: Filters the blob.
    ++* W4: Applies any required in-process filter, like end-of-line
    ++  conversion and re-encoding.
     +
     +* W5: Writes the result to the file descriptor opened at W2.
     +
    @@ Documentation/technical/parallel-checkout.txt (new)
     +  the result back to the main process, together with the end status of
     +  the operation and the item's identification number.
     +
    -+Note that steps W3 to W5 might actually be performed together, using the
    -+streaming interface.
    ++Note that, when possible, steps W3 to W5 are delegated to the streaming
    ++machinery, removing the need to keep the entire blob in memory.
     +
    -+Also note that the workers *never* remove any files. As mentioned
    -+earlier, it is the responsibility of the main process to remove any
    -+files that block the checkout operation (or abort it). This is crucial
    -+to avoid race conditions and also to properly detect path collisions at
    -+Step W1.
    ++Also note that the workers *never* remove any file. As mentioned
    ++earlier, it is the responsibility of the main process to remove any file
    ++that blocks the checkout operation (or abort if removing the file(s)
    ++would cause data loss and the user didn't ask to `--force`). This is
    ++crucial to avoid race conditions and also to properly detect path
    ++collisions at Step W1.
     +
     +After the workers finish writing the items and sending back the required
     +information, the main process handles the results in two steps:
    @@ Documentation/technical/parallel-checkout.txt (new)
     +deals with files that were already present in the working tree before
     +checkout. Basically, it checks if the path that it wants to write
     +already exists on disk, makes sure the existing file doesn't have
    -+unsaved data, and then overwrite it. (To be more pedantic: it deletes
    ++unsaved data, and then overwrites it. (To be more pedantic: it deletes
     +the existing file and creates the new one.) So, if there are multiple
     +colliding files to be checked out, the sequential code will write each
     +one of them but only the last will actually survive on disk.
    @@ Documentation/technical/parallel-checkout.txt (new)
     +remove all files that prevent this entry from being written (before
     +enqueueing it). This includes any non-directory file in the leading path
     +of the entry. Later, when a worker gets assigned the entry, it looks
    -+again for the non-directories files and for an already existent file at
    ++again for the non-directories files and for an already existing file at
     +the entry's path. If any of these checks finds something, the worker
     +knows that there was a path collision.
     +
    @@ Documentation/technical/parallel-checkout.txt (new)
     +postpone the return of some filtered blobs. The delayed checkout queue
     +and the parallel checkout queue are not compatible and should remain
     +separated.
    +++
    ++Note: regular files that only require internal filters, like end-of-line
    ++conversion and re-encoding, are eligible for parallel checkout.
     +
     +Ineligible entries are checked out by the classic sequential codepath
     +*before* spawning workers.
     +
     +Note: submodules's files are also eligible for parallel checkout (as
    -+long as they don't fall into the two excluding categories mentioned
    ++long as they don't fall into any of the excluding categories mentioned
     +above). But since each submodule is checked out in its own child
     +process, we don't mix the superproject's and the submodules' files in
     +the same parallel checkout process or queue.
-- 
2.30.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
@ 2021-04-08 16:17   ` Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-08 16:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.

Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:

- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
  detects it by looking for EEXIST and EISDIR errors after an
  open(O_CREAT | O_EXCL) failure.

- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
  at the has_dirs_only_path() check, which is done for the leading path
  of each item in the parallel checkout queue.

Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).

After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile            |   1 +
 entry.c             |  17 +-
 parallel-checkout.c | 368 ++++++++++++++++++++++++++++++++++++++++++++
 parallel-checkout.h |  32 ++++
 unpack-trees.c      |   6 +-
 5 files changed, 421 insertions(+), 3 deletions(-)
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

diff --git a/Makefile b/Makefile
index a6a73c5741..99734cc2ea 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pager.o
+LIB_OBJS += parallel-checkout.o
 LIB_OBJS += parse-options-cb.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += patch-delta.o
diff --git a/entry.c b/entry.c
index 2dc94ba5cc..d7ed38aa40 100644
--- a/entry.c
+++ b/entry.c
@@ -7,6 +7,7 @@
 #include "progress.h"
 #include "fsmonitor.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-		if (dup == ce)
-			break;
+		if (dup == ce) {
+			/*
+			 * Parallel checkout doesn't create the files in index
+			 * order. So the other side of the collision may appear
+			 * after the given cache_entry in the array.
+			 */
+			if (parallel_checkout_status() == PC_RUNNING)
+				continue;
+			else
+				break;
+		}
 
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
@@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
 		ca = &ca_buf;
 	}
 
+	if (!enqueue_checkout(ce, ca))
+		return 0;
+
 	return write_entry(ce, path.buf, ca, state, 0);
 }
 
diff --git a/parallel-checkout.c b/parallel-checkout.c
new file mode 100644
index 0000000000..80c839837b
--- /dev/null
+++ b/parallel-checkout.c
@@ -0,0 +1,368 @@
+#include "cache.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "streaming.h"
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/* pointer to a istate->cache[] entry. Not owned by us. */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	struct stat st;
+	enum pc_item_status status;
+};
+
+struct parallel_checkout {
+	enum pc_status status;
+	struct parallel_checkout_item *items; /* The parallel checkout queue. */
+	size_t nr, alloc;
+};
+
+static struct parallel_checkout parallel_checkout;
+
+enum pc_status parallel_checkout_status(void)
+{
+	return parallel_checkout.status;
+}
+
+void init_parallel_checkout(void)
+{
+	if (parallel_checkout.status != PC_UNINITIALIZED)
+		BUG("parallel checkout already initialized");
+
+	parallel_checkout.status = PC_ACCEPTING_ENTRIES;
+}
+
+static void finish_parallel_checkout(void)
+{
+	if (parallel_checkout.status == PC_UNINITIALIZED)
+		BUG("cannot finish parallel checkout: not initialized yet");
+
+	free(parallel_checkout.items);
+	memset(&parallel_checkout, 0, sizeof(parallel_checkout));
+}
+
+static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
+					     const struct conv_attrs *ca)
+{
+	enum conv_attrs_classification c;
+
+	/*
+	 * Symlinks cannot be checked out in parallel as, in case of path
+	 * collision, they could racily replace leading directories of other
+	 * entries being checked out. Submodules are checked out in child
+	 * processes, which have their own parallel checkout queues.
+	 */
+	if (!S_ISREG(ce->ce_mode))
+		return 0;
+
+	c = classify_conv_attrs(ca);
+	switch (c) {
+	case CA_CLASS_INCORE:
+		return 1;
+
+	case CA_CLASS_INCORE_FILTER:
+		/*
+		 * It would be safe to allow concurrent instances of
+		 * single-file smudge filters, like rot13, but we should not
+		 * assume that all filters are parallel-process safe. So we
+		 * don't allow this.
+		 */
+		return 0;
+
+	case CA_CLASS_INCORE_PROCESS:
+		/*
+		 * The parallel queue and the delayed queue are not compatible,
+		 * so they must be kept completely separated. And we can't tell
+		 * if a long-running process will delay its response without
+		 * actually asking it to perform the filtering. Therefore, this
+		 * type of filter is not allowed in parallel checkout.
+		 *
+		 * Furthermore, there should only be one instance of the
+		 * long-running process filter as we don't know how it is
+		 * managing its own concurrency. So, spreading the entries that
+		 * requisite such a filter among the parallel workers would
+		 * require a lot more inter-process communication. We would
+		 * probably have to designate a single process to interact with
+		 * the filter and send all the necessary data to it, for each
+		 * entry.
+		 */
+		return 0;
+
+	case CA_CLASS_STREAMABLE:
+		return 1;
+
+	default:
+		BUG("unsupported conv_attrs classification '%d'", c);
+	}
+}
+
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
+{
+	struct parallel_checkout_item *pc_item;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES ||
+	    !is_eligible_for_parallel_checkout(ce, ca))
+		return -1;
+
+	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
+		   parallel_checkout.alloc);
+
+	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item->ce = ce;
+	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
+	pc_item->status = PC_ITEM_PENDING;
+
+	return 0;
+}
+
+static int handle_results(struct checkout *state)
+{
+	int ret = 0;
+	size_t i;
+	int have_pending = 0;
+
+	/*
+	 * We first update the successfully written entries with the collected
+	 * stat() data, so that they can be found by mark_colliding_entries(),
+	 * in the next loop, when necessary.
+	 */
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		if (pc_item->status == PC_ITEM_WRITTEN)
+			update_ce_after_write(state, pc_item->ce, &pc_item->st);
+	}
+
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+
+		switch(pc_item->status) {
+		case PC_ITEM_WRITTEN:
+			/* Already handled */
+			break;
+		case PC_ITEM_COLLIDED:
+			/*
+			 * The entry could not be checked out due to a path
+			 * collision with another entry. Since there can only
+			 * be one entry of each colliding group on the disk, we
+			 * could skip trying to check out this one and move on.
+			 * However, this would leave the unwritten entries with
+			 * null stat() fields on the index, which could
+			 * potentially slow down subsequent operations that
+			 * require refreshing it: git would not be able to
+			 * trust st_size and would have to go to the filesystem
+			 * to see if the contents match (see ie_modified()).
+			 *
+			 * Instead, let's pay the overhead only once, now, and
+			 * call checkout_entry_ca() again for this file, to
+			 * have its stat() data stored in the index. This also
+			 * has the benefit of adding this entry and its
+			 * colliding pair to the collision report message.
+			 * Additionally, this overwriting behavior is consistent
+			 * with what the sequential checkout does, so it doesn't
+			 * add any extra overhead.
+			 */
+			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
+						 state, NULL, NULL);
+			break;
+		case PC_ITEM_PENDING:
+			have_pending = 1;
+			/* fall through */
+		case PC_ITEM_FAILED:
+			ret = -1;
+			break;
+		default:
+			BUG("unknown checkout item status in parallel checkout");
+		}
+	}
+
+	if (have_pending)
+		error("parallel checkout finished with pending entries");
+
+	return ret;
+}
+
+static int reset_fd(int fd, const char *path)
+{
+	if (lseek(fd, 0, SEEK_SET) != 0)
+		return error_errno("failed to rewind descriptor of '%s'", path);
+	if (ftruncate(fd, 0))
+		return error_errno("failed to truncate file '%s'", path);
+	return 0;
+}
+
+static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
+			       const char *path)
+{
+	int ret;
+	struct stream_filter *filter;
+	struct strbuf buf = STRBUF_INIT;
+	char *new_blob;
+	unsigned long size;
+	ssize_t wrote;
+
+	/* Sanity check */
+	assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
+
+	filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
+	if (filter) {
+		if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
+			/* On error, reset fd to try writing without streaming */
+			if (reset_fd(fd, path))
+				return -1;
+		} else {
+			return 0;
+		}
+	}
+
+	new_blob = read_blob_entry(pc_item->ce, &size);
+	if (!new_blob)
+		return error("cannot read object %s '%s'",
+			     oid_to_hex(&pc_item->ce->oid), path);
+
+	/*
+	 * checkout metadata is used to give context for external process
+	 * filters. Files requiring such filters are not eligible for parallel
+	 * checkout, so pass NULL.
+	 */
+	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
+					 new_blob, size, &buf, NULL);
+
+	if (ret) {
+		size_t newsize;
+		free(new_blob);
+		new_blob = strbuf_detach(&buf, &newsize);
+		size = newsize;
+	}
+
+	wrote = write_in_full(fd, new_blob, size);
+	free(new_blob);
+	if (wrote < 0)
+		return error("unable to write file '%s'", path);
+
+	return 0;
+}
+
+static int close_and_clear(int *fd)
+{
+	int ret = 0;
+
+	if (*fd >= 0) {
+		ret = close(*fd);
+		*fd = -1;
+	}
+
+	return ret;
+}
+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+			  struct checkout *state)
+{
+	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+	int fd = -1, fstat_done = 0;
+	struct strbuf path = STRBUF_INIT;
+	const char *dir_sep;
+
+	strbuf_add(&path, state->base_dir, state->base_dir_len);
+	strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
+	dir_sep = find_last_dir_sep(path.buf);
+
+	/*
+	 * The leading dirs should have been already created by now. But, in
+	 * case of path collisions, one of the dirs could have been replaced by
+	 * a symlink (checked out after we enqueued this entry for parallel
+	 * checkout). Thus, we must check the leading dirs again.
+	 */
+	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
+					   state->base_dir_len)) {
+		pc_item->status = PC_ITEM_COLLIDED;
+		goto out;
+	}
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode);
+
+	if (fd < 0) {
+		if (errno == EEXIST || errno == EISDIR) {
+			/*
+			 * Errors which probably represent a path collision.
+			 * Suppress the error message and mark the item to be
+			 * retried later, sequentially. ENOTDIR and ENOENT are
+			 * also interesting, but the above has_dirs_only_path()
+			 * call should have already caught these cases.
+			 */
+			pc_item->status = PC_ITEM_COLLIDED;
+		} else {
+			error_errno("failed to open file '%s'", path.buf);
+			pc_item->status = PC_ITEM_FAILED;
+		}
+		goto out;
+	}
+
+	if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
+		/* Error was already reported. */
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
+
+	if (close_and_clear(&fd)) {
+		error_errno("unable to close file '%s'", path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
+		error_errno("unable to stat just-written file '%s'",  path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	pc_item->status = PC_ITEM_WRITTEN;
+
+out:
+	/*
+	 * No need to check close() return at this point. Either fd is already
+	 * closed, or we are on an error path.
+	 */
+	close_and_clear(&fd);
+	strbuf_release(&path);
+}
+
+static void write_items_sequentially(struct checkout *state)
+{
+	size_t i;
+
+	for (i = 0; i < parallel_checkout.nr; i++)
+		write_pc_item(&parallel_checkout.items[i], state);
+}
+
+int run_parallel_checkout(struct checkout *state)
+{
+	int ret;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
+		BUG("cannot run parallel checkout: uninitialized or already running");
+
+	parallel_checkout.status = PC_RUNNING;
+
+	write_items_sequentially(state);
+	ret = handle_results(state);
+
+	finish_parallel_checkout();
+	return ret;
+}
diff --git a/parallel-checkout.h b/parallel-checkout.h
new file mode 100644
index 0000000000..4ad2a519b3
--- /dev/null
+++ b/parallel-checkout.h
@@ -0,0 +1,32 @@
+#ifndef PARALLEL_CHECKOUT_H
+#define PARALLEL_CHECKOUT_H
+
+struct cache_entry;
+struct checkout;
+struct conv_attrs;
+
+enum pc_status {
+	PC_UNINITIALIZED = 0,
+	PC_ACCEPTING_ENTRIES,
+	PC_RUNNING,
+};
+
+enum pc_status parallel_checkout_status(void);
+
+/*
+ * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
+ * only when in the PC_UNINITIALIZED state.
+ */
+void init_parallel_checkout(void);
+
+/*
+ * Return -1 if parallel checkout is currently not accepting entries or if the
+ * entry is not eligible for parallel checkout. Otherwise, enqueue the entry
+ * for later write and return 0.
+ */
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+
+/* Write all the queued entries, returning 0 on success.*/
+int run_parallel_checkout(struct checkout *state);
+
+#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a1afbc1e4..f0430d458d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -441,7 +442,6 @@ static int check_updates(struct unpack_trees_options *o,
 	if (should_update_submodules())
 		load_gitmodules_file(index, &state);
 
-	enable_delayed_checkout(&state);
 	if (has_promisor_remote()) {
 		/*
 		 * Prefetch the objects that are to be checked out in the loop
@@ -464,6 +464,9 @@ static int check_updates(struct unpack_trees_options *o,
 					   to_fetch.oid, to_fetch.nr);
 		oid_array_clear(&to_fetch);
 	}
+
+	enable_delayed_checkout(&state);
+	init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -477,6 +480,7 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
+	errs |= run_parallel_checkout(&state);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/5] parallel-checkout: make it truly parallel
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2021-04-08 16:17   ` Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-08 16:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Use multiple worker processes to distribute the queued entries and call
write_pc_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could affect
performance due to lock contention in the kernel. Work stealing (or any
other format of re-distribution) is not implemented yet.

The protocol between the main process and the workers is quite simple.
They exchange binary messages packed in pkt-line format, and use
PKT-FLUSH to mark the end of input (from both sides). The main process
starts the communication by sending N pkt-lines, each corresponding to
an item that needs to be written. These packets contain all the
necessary information to load, smudge, and write the blob associated
with each item. Then it waits for the worker to send back N pkt-lines
containing the results for each item. The resulting packet must contain:
the identification number of the item that it refers to, the status of
the operation, and the lstat() data gathered after writing the file (iff
the operation was successful).

For now, checkout always uses a hardcoded value of 2 workers, only to
demonstrate that the parallel checkout framework correctly divides and
writes the queued entries. The next patch will add user configurations
and define a more reasonable default, based on tests with the said
settings.

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 .gitignore                 |   1 +
 Makefile                   |   1 +
 builtin.h                  |   1 +
 builtin/checkout--worker.c | 145 ++++++++++++++++++
 git.c                      |   2 +
 parallel-checkout.c        | 300 +++++++++++++++++++++++++++++++++----
 parallel-checkout.h        |  73 ++++++++-
 7 files changed, 496 insertions(+), 27 deletions(-)
 create mode 100644 builtin/checkout--worker.c

diff --git a/.gitignore b/.gitignore
index 3dcdb6bb5a..96c794b1c7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@
 /git-check-mailmap
 /git-check-ref-format
 /git-checkout
+/git-checkout--worker
 /git-checkout-index
 /git-cherry
 /git-cherry-pick
diff --git a/Makefile b/Makefile
index 99734cc2ea..84e15f58a0 100644
--- a/Makefile
+++ b/Makefile
@@ -1062,6 +1062,7 @@ BUILTIN_OBJS += builtin/check-attr.o
 BUILTIN_OBJS += builtin/check-ignore.o
 BUILTIN_OBJS += builtin/check-mailmap.o
 BUILTIN_OBJS += builtin/check-ref-format.o
+BUILTIN_OBJS += builtin/checkout--worker.o
 BUILTIN_OBJS += builtin/checkout-index.o
 BUILTIN_OBJS += builtin/checkout.o
 BUILTIN_OBJS += builtin/clean.o
diff --git a/builtin.h b/builtin.h
index b6ce981b73..16ecd5586f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -123,6 +123,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix);
 int cmd_bundle(int argc, const char **argv, const char *prefix);
 int cmd_cat_file(int argc, const char **argv, const char *prefix);
 int cmd_checkout(int argc, const char **argv, const char *prefix);
+int cmd_checkout__worker(int argc, const char **argv, const char *prefix);
 int cmd_checkout_index(int argc, const char **argv, const char *prefix);
 int cmd_check_attr(int argc, const char **argv, const char *prefix);
 int cmd_check_ignore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout--worker.c b/builtin/checkout--worker.c
new file mode 100644
index 0000000000..31e0de2f7e
--- /dev/null
+++ b/builtin/checkout--worker.c
@@ -0,0 +1,145 @@
+#include "builtin.h"
+#include "config.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static void packet_to_pc_item(const char *buffer, int len,
+			      struct parallel_checkout_item *pc_item)
+{
+	const struct pc_item_fixed_portion *fixed_portion;
+	const char *variant;
+	char *encoding;
+
+	if (len < sizeof(struct pc_item_fixed_portion))
+		BUG("checkout worker received too short item (got %dB, exp %dB)",
+		    len, (int)sizeof(struct pc_item_fixed_portion));
+
+	fixed_portion = (struct pc_item_fixed_portion *)buffer;
+
+	if (len - sizeof(struct pc_item_fixed_portion) !=
+		fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
+		BUG("checkout worker received corrupted item");
+
+	variant = buffer + sizeof(struct pc_item_fixed_portion);
+
+	/*
+	 * Note: the main process uses zero length to communicate that the
+	 * encoding is NULL. There is no use case that requires sending an
+	 * actual empty string, since convert_attrs() never sets
+	 * ca.working_tree_enconding to "".
+	 */
+	if (fixed_portion->working_tree_encoding_len) {
+		encoding = xmemdupz(variant,
+				    fixed_portion->working_tree_encoding_len);
+		variant += fixed_portion->working_tree_encoding_len;
+	} else {
+		encoding = NULL;
+	}
+
+	memset(pc_item, 0, sizeof(*pc_item));
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce->ce_namelen = fixed_portion->name_len;
+	pc_item->ce->ce_mode = fixed_portion->ce_mode;
+	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
+	oidcpy(&pc_item->ce->oid, &fixed_portion->oid);
+
+	pc_item->id = fixed_portion->id;
+	pc_item->ca.crlf_action = fixed_portion->crlf_action;
+	pc_item->ca.ident = fixed_portion->ident;
+	pc_item->ca.working_tree_encoding = encoding;
+}
+
+static void report_result(struct parallel_checkout_item *pc_item)
+{
+	struct pc_item_result res;
+	size_t size;
+
+	res.id = pc_item->id;
+	res.status = pc_item->status;
+
+	if (pc_item->status == PC_ITEM_WRITTEN) {
+		res.st = pc_item->st;
+		size = sizeof(res);
+	} else {
+		size = PC_ITEM_RESULT_BASE_SIZE;
+	}
+
+	packet_write(1, (const char *)&res, size);
+}
+
+/* Free the worker-side malloced data, but not pc_item itself. */
+static void release_pc_item_data(struct parallel_checkout_item *pc_item)
+{
+	free((char *)pc_item->ca.working_tree_encoding);
+	discard_cache_entry(pc_item->ce);
+}
+
+static void worker_loop(struct checkout *state)
+{
+	struct parallel_checkout_item *items = NULL;
+	size_t i, nr = 0, alloc = 0;
+
+	while (1) {
+		int len = packet_read(0, NULL, NULL, packet_buffer,
+				      sizeof(packet_buffer), 0);
+
+		if (len < 0)
+			BUG("packet_read() returned negative value");
+		else if (!len)
+			break;
+
+		ALLOC_GROW(items, nr + 1, alloc);
+		packet_to_pc_item(packet_buffer, len, &items[nr++]);
+	}
+
+	for (i = 0; i < nr; i++) {
+		struct parallel_checkout_item *pc_item = &items[i];
+		write_pc_item(pc_item, state);
+		report_result(pc_item);
+		release_pc_item_data(pc_item);
+	}
+
+	packet_flush(1);
+
+	free(items);
+}
+
+static const char * const checkout_worker_usage[] = {
+	N_("git checkout--worker [<options>]"),
+	NULL
+};
+
+int cmd_checkout__worker(int argc, const char **argv, const char *prefix)
+{
+	struct checkout state = CHECKOUT_INIT;
+	struct option checkout_worker_options[] = {
+		OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
+			N_("when creating files, prepend <string>")),
+		OPT_END()
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(checkout_worker_usage,
+				   checkout_worker_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix, checkout_worker_options,
+			     checkout_worker_usage, 0);
+	if (argc > 0)
+		usage_with_options(checkout_worker_usage, checkout_worker_options);
+
+	if (state.base_dir)
+		state.base_dir_len = strlen(state.base_dir);
+
+	/*
+	 * Setting this on a worker won't actually update the index. We just
+	 * need to tell the checkout machinery to lstat() the written entries,
+	 * so that we can send this data back to the main process.
+	 */
+	state.refresh_cache = 1;
+
+	worker_loop(&state);
+	return 0;
+}
diff --git a/git.c b/git.c
index 9bc077a025..532b5b4136 100644
--- a/git.c
+++ b/git.c
@@ -490,6 +490,8 @@ static struct cmd_struct commands[] = {
 	{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
 	{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT  },
 	{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+	{ "checkout--worker", cmd_checkout__worker,
+		RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
 	{ "checkout-index", cmd_checkout_index,
 		RUN_SETUP | NEED_WORK_TREE},
 	{ "cherry", cmd_cherry, RUN_SETUP },
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 80c839837b..41c301bbda 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,28 +1,14 @@
 #include "cache.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "pkt-line.h"
+#include "run-command.h"
+#include "sigchain.h"
 #include "streaming.h"
 
-enum pc_item_status {
-	PC_ITEM_PENDING = 0,
-	PC_ITEM_WRITTEN,
-	/*
-	 * The entry could not be written because there was another file
-	 * already present in its path or leading directories. Since
-	 * checkout_entry_ca() removes such files from the working tree before
-	 * enqueueing the entry for parallel checkout, it means that there was
-	 * a path collision among the entries being written.
-	 */
-	PC_ITEM_COLLIDED,
-	PC_ITEM_FAILED,
-};
-
-struct parallel_checkout_item {
-	/* pointer to a istate->cache[] entry. Not owned by us. */
-	struct cache_entry *ce;
-	struct conv_attrs ca;
-	struct stat st;
-	enum pc_item_status status;
+struct pc_worker {
+	struct child_process cp;
+	size_t next_item_to_complete, nr_items_to_complete;
 };
 
 struct parallel_checkout {
@@ -59,6 +45,7 @@ static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
 					     const struct conv_attrs *ca)
 {
 	enum conv_attrs_classification c;
+	size_t packed_item_size;
 
 	/*
 	 * Symlinks cannot be checked out in parallel as, in case of path
@@ -69,6 +56,18 @@ static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
 	if (!S_ISREG(ce->ce_mode))
 		return 0;
 
+	packed_item_size = sizeof(struct pc_item_fixed_portion) + ce->ce_namelen +
+		(ca->working_tree_encoding ? strlen(ca->working_tree_encoding) : 0);
+
+	/*
+	 * The amount of data we send to the workers per checkout item is
+	 * typically small (75~300B). So unless we find an insanely huge path
+	 * of 64KB, we should never reach the 65KB limit of one pkt-line. If
+	 * that does happen, we let the sequential code handle the item.
+	 */
+	if (packed_item_size > LARGE_PACKET_DATA_MAX)
+		return 0;
+
 	c = classify_conv_attrs(ca);
 	switch (c) {
 	case CA_CLASS_INCORE:
@@ -121,10 +120,12 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
 		   parallel_checkout.alloc);
 
-	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item = &parallel_checkout.items[parallel_checkout.nr];
 	pc_item->ce = ce;
 	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
 	pc_item->status = PC_ITEM_PENDING;
+	pc_item->id = parallel_checkout.nr;
+	parallel_checkout.nr++;
 
 	return 0;
 }
@@ -236,7 +237,8 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
 	/*
 	 * checkout metadata is used to give context for external process
 	 * filters. Files requiring such filters are not eligible for parallel
-	 * checkout, so pass NULL.
+	 * checkout, so pass NULL. Note: if that changes, the metadata must also
+	 * be passed from the main process to the workers.
 	 */
 	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
 					 new_blob, size, &buf, NULL);
@@ -268,8 +270,8 @@ static int close_and_clear(int *fd)
 	return ret;
 }
 
-static void write_pc_item(struct parallel_checkout_item *pc_item,
-			  struct checkout *state)
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state)
 {
 	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
 	int fd = -1, fstat_done = 0;
@@ -343,6 +345,240 @@ static void write_pc_item(struct parallel_checkout_item *pc_item,
 	strbuf_release(&path);
 }
 
+static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
+{
+	size_t len_data;
+	char *data, *variant;
+	struct pc_item_fixed_portion *fixed_portion;
+	const char *working_tree_encoding = pc_item->ca.working_tree_encoding;
+	size_t name_len = pc_item->ce->ce_namelen;
+	size_t working_tree_encoding_len = working_tree_encoding ?
+					   strlen(working_tree_encoding) : 0;
+
+	/*
+	 * Any changes in the calculation of the message size must also be made
+	 * in is_eligible_for_parallel_checkout().
+	 */
+	len_data = sizeof(struct pc_item_fixed_portion) + name_len +
+		   working_tree_encoding_len;
+
+	data = xcalloc(1, len_data);
+
+	fixed_portion = (struct pc_item_fixed_portion *)data;
+	fixed_portion->id = pc_item->id;
+	fixed_portion->ce_mode = pc_item->ce->ce_mode;
+	fixed_portion->crlf_action = pc_item->ca.crlf_action;
+	fixed_portion->ident = pc_item->ca.ident;
+	fixed_portion->name_len = name_len;
+	fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
+	/*
+	 * We use hashcpy() instead of oidcpy() because the hash[] positions
+	 * after `the_hash_algo->rawsz` might not be initialized. And Valgrind
+	 * would complain about passing uninitialized bytes to a syscall
+	 * (write(2)). There is no real harm in this case, but the warning could
+	 * hinder the detection of actual errors.
+	 */
+	hashcpy(fixed_portion->oid.hash, pc_item->ce->oid.hash);
+
+	variant = data + sizeof(*fixed_portion);
+	if (working_tree_encoding_len) {
+		memcpy(variant, working_tree_encoding, working_tree_encoding_len);
+		variant += working_tree_encoding_len;
+	}
+	memcpy(variant, pc_item->ce->name, name_len);
+
+	packet_write(fd, data, len_data);
+
+	free(data);
+}
+
+static void send_batch(int fd, size_t start, size_t nr)
+{
+	size_t i;
+	sigchain_push(SIGPIPE, SIG_IGN);
+	for (i = 0; i < nr; i++)
+		send_one_item(fd, &parallel_checkout.items[start + i]);
+	packet_flush(fd);
+	sigchain_pop(SIGPIPE);
+}
+
+static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
+{
+	struct pc_worker *workers;
+	int i, workers_with_one_extra_item;
+	size_t base_batch_size, batch_beginning = 0;
+
+	ALLOC_ARRAY(workers, num_workers);
+
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+
+		child_process_init(cp);
+		cp->git_cmd = 1;
+		cp->in = -1;
+		cp->out = -1;
+		cp->clean_on_exit = 1;
+		strvec_push(&cp->args, "checkout--worker");
+		if (state->base_dir_len)
+			strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
+		if (start_command(cp))
+			die("failed to spawn checkout worker");
+	}
+
+	base_batch_size = parallel_checkout.nr / num_workers;
+	workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
+	for (i = 0; i < num_workers; i++) {
+		struct pc_worker *worker = &workers[i];
+		size_t batch_size = base_batch_size;
+
+		/* distribute the extra work evenly */
+		if (i < workers_with_one_extra_item)
+			batch_size++;
+
+		send_batch(worker->cp.in, batch_beginning, batch_size);
+		worker->next_item_to_complete = batch_beginning;
+		worker->nr_items_to_complete = batch_size;
+
+		batch_beginning += batch_size;
+	}
+
+	return workers;
+}
+
+static void finish_workers(struct pc_worker *workers, int num_workers)
+{
+	int i;
+
+	/*
+	 * Close pipes before calling finish_command() to let the workers
+	 * exit asynchronously and avoid spending extra time on wait().
+	 */
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+		if (cp->in >= 0)
+			close(cp->in);
+		if (cp->out >= 0)
+			close(cp->out);
+	}
+
+	for (i = 0; i < num_workers; i++) {
+		int rc = finish_command(&workers[i].cp);
+		if (rc > 128) {
+			/*
+			 * For a normal non-zero exit, the worker should have
+			 * already printed something useful to stderr. But a
+			 * death by signal should be mentioned to the user.
+			 */
+			error("checkout worker %d died of signal %d", i, rc - 128);
+		}
+	}
+
+	free(workers);
+}
+
+static inline void assert_pc_item_result_size(int got, int exp)
+{
+	if (got != exp)
+		BUG("wrong result size from checkout worker (got %dB, exp %dB)",
+		    got, exp);
+}
+
+static void parse_and_save_result(const char *buffer, int len,
+				  struct pc_worker *worker)
+{
+	struct pc_item_result *res;
+	struct parallel_checkout_item *pc_item;
+	struct stat *st = NULL;
+
+	if (len < PC_ITEM_RESULT_BASE_SIZE)
+		BUG("too short result from checkout worker (got %dB, exp >=%dB)",
+		    len, (int)PC_ITEM_RESULT_BASE_SIZE);
+
+	res = (struct pc_item_result *)buffer;
+
+	/*
+	 * Worker should send either the full result struct on success, or
+	 * just the base (i.e. no stat data), otherwise.
+	 */
+	if (res->status == PC_ITEM_WRITTEN) {
+		assert_pc_item_result_size(len, (int)sizeof(struct pc_item_result));
+		st = &res->st;
+	} else {
+		assert_pc_item_result_size(len, (int)PC_ITEM_RESULT_BASE_SIZE);
+	}
+
+	if (!worker->nr_items_to_complete)
+		BUG("received result from supposedly finished checkout worker");
+	if (res->id != worker->next_item_to_complete)
+		BUG("unexpected item id from checkout worker (got %"PRIuMAX", exp %"PRIuMAX")",
+		    (uintmax_t)res->id, (uintmax_t)worker->next_item_to_complete);
+
+	worker->next_item_to_complete++;
+	worker->nr_items_to_complete--;
+
+	pc_item = &parallel_checkout.items[res->id];
+	pc_item->status = res->status;
+	if (st)
+		pc_item->st = *st;
+}
+
+static void gather_results_from_workers(struct pc_worker *workers,
+					int num_workers)
+{
+	int i, active_workers = num_workers;
+	struct pollfd *pfds;
+
+	CALLOC_ARRAY(pfds, num_workers);
+	for (i = 0; i < num_workers; i++) {
+		pfds[i].fd = workers[i].cp.out;
+		pfds[i].events = POLLIN;
+	}
+
+	while (active_workers) {
+		int nr = poll(pfds, num_workers, -1);
+
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("failed to poll checkout workers");
+		}
+
+		for (i = 0; i < num_workers && nr > 0; i++) {
+			struct pc_worker *worker = &workers[i];
+			struct pollfd *pfd = &pfds[i];
+
+			if (!pfd->revents)
+				continue;
+
+			if (pfd->revents & POLLIN) {
+				int len = packet_read(pfd->fd, NULL, NULL,
+						      packet_buffer,
+						      sizeof(packet_buffer), 0);
+
+				if (len < 0) {
+					BUG("packet_read() returned negative value");
+				} else if (!len) {
+					pfd->fd = -1;
+					active_workers--;
+				} else {
+					parse_and_save_result(packet_buffer,
+							      len, worker);
+				}
+			} else if (pfd->revents & POLLHUP) {
+				pfd->fd = -1;
+				active_workers--;
+			} else if (pfd->revents & (POLLNVAL | POLLERR)) {
+				die("error polling from checkout worker");
+			}
+
+			nr--;
+		}
+	}
+
+	free(pfds);
+}
+
 static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
@@ -351,16 +587,28 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
+static const int DEFAULT_NUM_WORKERS = 2;
+
 int run_parallel_checkout(struct checkout *state)
 {
-	int ret;
+	int ret, num_workers = DEFAULT_NUM_WORKERS;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
 
-	write_items_sequentially(state);
+	if (parallel_checkout.nr < num_workers)
+		num_workers = parallel_checkout.nr;
+
+	if (num_workers <= 1) {
+		write_items_sequentially(state);
+	} else {
+		struct pc_worker *workers = setup_workers(state, num_workers);
+		gather_results_from_workers(workers, num_workers);
+		finish_workers(workers, num_workers);
+	}
+
 	ret = handle_results(state);
 
 	finish_parallel_checkout();
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 4ad2a519b3..ec58716519 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -1,9 +1,14 @@
 #ifndef PARALLEL_CHECKOUT_H
 #define PARALLEL_CHECKOUT_H
 
+#include "convert.h"
+
 struct cache_entry;
 struct checkout;
-struct conv_attrs;
+
+/****************************************************************
+ * Users of parallel checkout
+ ****************************************************************/
 
 enum pc_status {
 	PC_UNINITIALIZED = 0,
@@ -29,4 +34,70 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 /* Write all the queued entries, returning 0 on success.*/
 int run_parallel_checkout(struct checkout *state);
 
+/****************************************************************
+ * Interface with checkout--worker
+ ****************************************************************/
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/*
+	 * In main process ce points to a istate->cache[] entry. Thus, it's not
+	 * owned by us. In workers they own the memory, which *must be* released.
+	 */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	size_t id; /* position in parallel_checkout.items[] of main process */
+
+	/* Output fields, sent from workers. */
+	enum pc_item_status status;
+	struct stat st;
+};
+
+/*
+ * The fixed-size portion of `struct parallel_checkout_item` that is sent to the
+ * workers. Following this will be 2 strings: ca.working_tree_encoding and
+ * ce.name; These are NOT null terminated, since we have the size in the fixed
+ * portion.
+ *
+ * Note that not all fields of conv_attrs and cache_entry are passed, only the
+ * ones that will be required by the workers to smudge and write the entry.
+ */
+struct pc_item_fixed_portion {
+	size_t id;
+	struct object_id oid;
+	unsigned int ce_mode;
+	enum convert_crlf_action crlf_action;
+	int ident;
+	size_t working_tree_encoding_len;
+	size_t name_len;
+};
+
+/*
+ * The fields of `struct parallel_checkout_item` that are returned by the
+ * workers. Note: `st` must be the last one, as it is omitted on error.
+ */
+struct pc_item_result {
+	size_t id;
+	enum pc_item_status status;
+	struct stat st;
+};
+
+#define PC_ITEM_RESULT_BASE_SIZE offsetof(struct pc_item_result, st)
+
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state);
+
 #endif /* PARALLEL_CHECKOUT_H */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/5] parallel-checkout: add configuration options
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
@ 2021-04-08 16:17   ` Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-08 16:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.

To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.

Local SSD:
             Sequential             10 workers            Speedup
Clone        8.805 s ± 0.043 s      3.564 s ± 0.041 s     2.47 ± 0.03
Checkout I   9.678 s ± 0.057 s      4.486 s ± 0.050 s     2.16 ± 0.03
Checkout II  5.034 s ± 0.072 s      3.021 s ± 0.038 s     1.67 ± 0.03

Local HDD:
             Sequential             10 workers             Speedup
Clone        32.288 s ± 0.580 s     30.724 s ± 0.522 s    1.05 ± 0.03
Checkout I   54.172 s ±  7.119 s    54.429 s ± 6.738 s    1.00 ± 0.18
Checkout II  40.465 s ± 2.402 s     38.682 s ± 1.365 s    1.05 ± 0.07

Linux NFS server (v4.1, on EBS, single availability zone):

             Sequential             32 workers            Speedup
Clone        240.368 s ± 6.347 s    57.349 s ± 0.870 s    4.19 ± 0.13
Checkout I   242.862 s ± 2.215 s    58.700 s ± 0.904 s    4.14 ± 0.07
Checkout II  65.751 s ± 1.577 s     23.820 s ± 0.407 s    2.76 ± 0.08

EFS (v4.1, replicated over multiple availability zones):

             Sequential             32 workers            Speedup
Clone        922.321 s ± 2.274 s    210.453 s ± 3.412 s   4.38 ± 0.07
Checkout I   1011.300 s ± 7.346 s   297.828 s ± 0.964 s   3.40 ± 0.03
Checkout II  294.104 s ± 1.836 s    126.017 s ± 1.190 s   2.33 ± 0.03

The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.

To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:

             sequential fallback   10 workers           speedup
10   files    772.3 ms ± 12.6 ms   769.0 ms ± 13.6 ms   1.00 ± 0.02
20   files    780.5 ms ± 15.8 ms   775.2 ms ±  9.2 ms   1.01 ± 0.02
50   files    806.2 ms ± 13.8 ms   767.4 ms ±  8.5 ms   1.05 ± 0.02
100  files    833.7 ms ± 21.4 ms   750.5 ms ± 16.8 ms   1.11 ± 0.04
200  files    897.6 ms ± 30.9 ms   730.5 ms ± 14.7 ms   1.23 ± 0.05
500  files   1035.4 ms ± 48.0 ms   677.1 ms ± 22.3 ms   1.53 ± 0.09
1000 files   1244.6 ms ± 35.6 ms   654.0 ms ± 38.3 ms   1.90 ± 0.12
2000 files   1488.8 ms ± 53.4 ms   658.8 ms ± 23.8 ms   2.26 ± 0.12

From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.

Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.

About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/checkout.txt | 21 +++++++++++++++++++++
 parallel-checkout.c               | 24 +++++++++++++++++++-----
 parallel-checkout.h               |  9 +++++++--
 unpack-trees.c                    | 10 +++++++---
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt
index 2cddf7b4b4..bfbca90f0e 100644
--- a/Documentation/config/checkout.txt
+++ b/Documentation/config/checkout.txt
@@ -21,3 +21,24 @@ checkout.guess::
 	Provides the default value for the `--guess` or `--no-guess`
 	option in `git checkout` and `git switch`. See
 	linkgit:git-switch[1] and linkgit:git-checkout[1].
+
+checkout.workers::
+	The number of parallel workers to use when updating the working tree.
+	The default is one, i.e. sequential execution. If set to a value less
+	than one, Git will use as many workers as the number of logical cores
+	available. This setting and `checkout.thresholdForParallelism` affect
+	all commands that perform checkout. E.g. checkout, clone, reset,
+	sparse-checkout, etc.
++
+Note: parallel checkout usually delivers better performance for repositories
+located on SSDs or over NFS. For repositories on spinning disks and/or machines
+with a small number of cores, the default sequential checkout often performs
+better. The size and compression level of a repository might also influence how
+well the parallel version performs.
+
+checkout.thresholdForParallelism::
+	When running parallel checkout with a small number of files, the cost
+	of subprocess spawning and inter-process communication might outweigh
+	the parallelization gains. This setting allows to define the minimum
+	number of files for which parallel checkout should be attempted. The
+	default is 100.
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 41c301bbda..d6a0f31664 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,10 +1,12 @@
 #include "cache.h"
+#include "config.h"
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "streaming.h"
+#include "thread-utils.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -24,6 +26,20 @@ enum pc_status parallel_checkout_status(void)
 	return parallel_checkout.status;
 }
 
+static const int DEFAULT_THRESHOLD_FOR_PARALLELISM = 100;
+static const int DEFAULT_NUM_WORKERS = 1;
+
+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+	if (git_config_get_int("checkout.workers", num_workers))
+		*num_workers = DEFAULT_NUM_WORKERS;
+	else if (*num_workers < 1)
+		*num_workers = online_cpus();
+
+	if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+		*threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
 void init_parallel_checkout(void)
 {
 	if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -587,11 +603,9 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
-static const int DEFAULT_NUM_WORKERS = 2;
-
-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
 {
-	int ret, num_workers = DEFAULT_NUM_WORKERS;
+	int ret;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
@@ -601,7 +615,7 @@ int run_parallel_checkout(struct checkout *state)
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
 
-	if (num_workers <= 1) {
+	if (num_workers <= 1 || parallel_checkout.nr < threshold) {
 		write_items_sequentially(state);
 	} else {
 		struct pc_worker *workers = setup_workers(state, num_workers);
diff --git a/parallel-checkout.h b/parallel-checkout.h
index ec58716519..2a68ab954d 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -17,6 +17,7 @@ enum pc_status {
 };
 
 enum pc_status parallel_checkout_status(void);
+void get_parallel_checkout_configs(int *num_workers, int *threshold);
 
 /*
  * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
@@ -31,8 +32,12 @@ void init_parallel_checkout(void);
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 
-/* Write all the queued entries, returning 0 on success.*/
-int run_parallel_checkout(struct checkout *state);
+/*
+ * Write all the queued entries, returning 0 on success. If the number of
+ * entries is smaller than the specified threshold, the operation is performed
+ * sequentially.
+ */
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
 
 /****************************************************************
  * Interface with checkout--worker
diff --git a/unpack-trees.c b/unpack-trees.c
index f0430d458d..0669748f21 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
 	int errs = 0;
 	struct progress *progress;
 	struct checkout state = CHECKOUT_INIT;
-	int i;
+	int i, pc_workers, pc_threshold;
 
 	trace_performance_enter();
 	state.force = 1;
@@ -465,8 +465,11 @@ static int check_updates(struct unpack_trees_options *o,
 		oid_array_clear(&to_fetch);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
 	enable_delayed_checkout(&state);
-	init_parallel_checkout();
+	if (pc_workers > 1)
+		init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
-	errs |= run_parallel_checkout(&state);
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 4/5] parallel-checkout: support progress displaying
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
                     ` (2 preceding siblings ...)
  2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
@ 2021-04-08 16:17   ` Matheus Tavares
  2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-08 16:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c | 34 +++++++++++++++++++++++++++++++---
 parallel-checkout.h |  5 ++++-
 unpack-trees.c      | 11 ++++++++---
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/parallel-checkout.c b/parallel-checkout.c
index d6a0f31664..d09ad8a7c6 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -3,6 +3,7 @@
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
+#include "progress.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "streaming.h"
@@ -17,6 +18,8 @@ struct parallel_checkout {
 	enum pc_status status;
 	struct parallel_checkout_item *items; /* The parallel checkout queue. */
 	size_t nr, alloc;
+	struct progress *progress;
+	unsigned int *progress_cnt;
 };
 
 static struct parallel_checkout parallel_checkout;
@@ -146,6 +149,20 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	return 0;
 }
 
+size_t pc_queue_size(void)
+{
+	return parallel_checkout.nr;
+}
+
+static void advance_progress_meter(void)
+{
+	if (parallel_checkout.progress) {
+		(*parallel_checkout.progress_cnt)++;
+		display_progress(parallel_checkout.progress,
+				 *parallel_checkout.progress_cnt);
+	}
+}
+
 static int handle_results(struct checkout *state)
 {
 	int ret = 0;
@@ -194,6 +211,7 @@ static int handle_results(struct checkout *state)
 			 */
 			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
 						 state, NULL, NULL);
+			advance_progress_meter();
 			break;
 		case PC_ITEM_PENDING:
 			have_pending = 1;
@@ -537,6 +555,9 @@ static void parse_and_save_result(const char *buffer, int len,
 	pc_item->status = res->status;
 	if (st)
 		pc_item->st = *st;
+
+	if (res->status != PC_ITEM_COLLIDED)
+		advance_progress_meter();
 }
 
 static void gather_results_from_workers(struct pc_worker *workers,
@@ -599,11 +620,16 @@ static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
 
-	for (i = 0; i < parallel_checkout.nr; i++)
-		write_pc_item(&parallel_checkout.items[i], state);
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		write_pc_item(pc_item, state);
+		if (pc_item->status != PC_ITEM_COLLIDED)
+			advance_progress_meter();
+	}
 }
 
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt)
 {
 	int ret;
 
@@ -611,6 +637,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
+	parallel_checkout.progress = progress;
+	parallel_checkout.progress_cnt = progress_cnt;
 
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 2a68ab954d..80f539bcb7 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -5,6 +5,7 @@
 
 struct cache_entry;
 struct checkout;
+struct progress;
 
 /****************************************************************
  * Users of parallel checkout
@@ -31,13 +32,15 @@ void init_parallel_checkout(void);
  * for later write and return 0.
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+size_t pc_queue_size(void);
 
 /*
  * Write all the queued entries, returning 0 on success. If the number of
  * entries is smaller than the specified threshold, the operation is performed
  * sequentially.
  */
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt);
 
 /****************************************************************
  * Interface with checkout--worker
diff --git a/unpack-trees.c b/unpack-trees.c
index 0669748f21..4b77e52c6b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -474,17 +474,22 @@ static int check_updates(struct unpack_trees_options *o,
 		struct cache_entry *ce = index->cache[i];
 
 		if (ce->ce_flags & CE_UPDATE) {
+			size_t last_pc_queue_size = pc_queue_size();
+
 			if (ce->ce_flags & CE_WT_REMOVE)
 				BUG("both update and delete flags are set on %s",
 				    ce->name);
-			display_progress(progress, ++cnt);
 			ce->ce_flags &= ~CE_UPDATE;
 			errs |= checkout_entry(ce, &state, NULL, NULL);
+
+			if (last_pc_queue_size == pc_queue_size())
+				display_progress(progress, ++cnt);
 		}
 	}
-	stop_progress(&progress);
 	if (pc_workers > 1)
-		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      progress, &cnt);
+	stop_progress(&progress);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 5/5] parallel-checkout: add design documentation
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
                     ` (3 preceding siblings ...)
  2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
@ 2021-04-08 16:17   ` Matheus Tavares
  2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-08 16:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, gitster, git

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/Makefile                        |   1 +
 Documentation/technical/parallel-checkout.txt | 268 ++++++++++++++++++
 2 files changed, 269 insertions(+)
 create mode 100644 Documentation/technical/parallel-checkout.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a04..af236927c9 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -90,6 +90,7 @@ TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
+TECH_DOCS += technical/parallel-checkout
 TECH_DOCS += technical/partial-clone
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt
new file mode 100644
index 0000000000..b8703eae49
--- /dev/null
+++ b/Documentation/technical/parallel-checkout.txt
@@ -0,0 +1,268 @@
+Parallel Checkout Design Notes
+==============================
+
+The "Parallel Checkout" feature attempts to use multiple processes to
+parallelize the work of uncompressing the blobs, applying in-core
+filters, and writing the resulting contents to the working tree during a
+checkout operation. It can be used by all checkout-related commands,
+such as `clone`, `checkout`, `reset`, `sparse-checkout`, and others.
+
+These commands share the following basic structure:
+
+* Step 1: Read the current index file into memory.
+
+* Step 2: Modify the in-memory index based upon the command, and
+  temporarily mark all cache entries that need to be updated.
+
+* Step 3: Populate the working tree to match the new candidate index.
+  This includes iterating over all of the to-be-updated cache entries
+  and delete, create, or overwrite the associated files in the working
+  tree.
+
+* Step 4: Write the new index to disk.
+
+Step 3 is the focus of the "parallel checkout" effort described here.
+It dominates the execution time for most of the above command types.
+
+Sequential Implementation
+-------------------------
+
+For the purposes of discussion here, the current sequential
+implementation of Step 3 is divided in 3 parts, each one implemented in
+its own function:
+
+* Step 3a: `unpack-trees.c:check_updates()` contains a series of
+  sequential loops iterating over the `cache_entry`'s array. The main
+  loop in this function calls the Step 3b function for each of the
+  to-be-updated entries.
+
+* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
+  for file conflicts, collisions, and unsaved changes. It removes files
+  and creates leading directories as necessary. It calls the Step 3c
+  function for each entry to be written.
+
+* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
+  it if necessary, creates the file in the working tree, writes the
+  smudged contents, calls `fstat()` or `lstat()`, and updates the
+  associated `cache_entry` struct with the stat information gathered.
+
+It wouldn't be safe to perform Step 3b in parallel, as there could be
+race conditions between file creations and removals. Instead, the
+parallel checkout framework lets the sequential code handle Step 3b,
+and use parallel workers to replace the sequential
+`entry.c:write_entry()` calls from Step 3c.
+
+Rejected Multi-Threaded Solution
+--------------------------------
+
+The most "straightforward" implementation would be to spread the set of
+to-be-updated cache entries across multiple threads. But due to the
+thread-unsafe functions in the ODB code, we would have to use locks to
+coordinate the parallel operation. An early prototype of this solution
+showed that the multi-threaded checkout would bring performance
+improvements over the sequential code, but there was still too much lock
+contention. A `perf` profiling indicated that around 20% of the runtime
+during a local Linux clone (on an SSD) was spent in locking functions.
+For this reason this approach was rejected in favor of using multiple
+child processes, which led to a better performance.
+
+Multi-Process Solution
+----------------------
+
+Parallel checkout alters the aforementioned Step 3 to use multiple
+`checkout--worker` background processes to distribute the work. The
+long-running worker processes are controlled by the foreground Git
+command using the existing run-command API.
+
+Overview
+~~~~~~~~
+
+Step 3b is only slightly altered; for each entry to be checked out, the
+main process performs the following steps:
+
+* M1: Check whether there is any untracked or unclean file in the
+  working tree which would be overwritten by this entry, and decide
+  whether to proceed (removing the file(s)) or not.
+
+* M2: Create the leading directories.
+
+* M3: Load the conversion attributes for the entry's path.
+
+* M4: Check, based on the entry's type and conversion attributes,
+  whether the entry is eligible for parallel checkout (more on this
+  later). If it is eligible, enqueue the entry and the loaded
+  attributes to later write the entry in parallel. If not, write the
+  entry right away, using the default sequential code.
+
+Note: we save the conversion attributes associated with each entry
+because the workers don't have access to the main process' index state,
+so they can't load the attributes by themselves (and the attributes are
+needed to properly smudge the entry). Additionally, this has a positive
+impact on performance as (1) we don't need to load the attributes twice
+and (2) the attributes machinery is optimized to handle paths in
+sequential order.
+
+After all entries have passed through the above steps, the main process
+checks if the number of enqueued entries is sufficient to spread among
+the workers. If not, it just writes them sequentially. Otherwise, it
+spawns the workers and distributes the queued entries uniformly in
+continuous chunks. This aims to minimize the chances of two workers
+writing to the same directory simultaneously, which could increase lock
+contention in the kernel.
+
+Then, for each assigned item, each worker:
+
+* W1: Checks if there is any non-directory file in the leading part of
+  the entry's path or if there already exists a file at the entry' path.
+  If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
+  this later).
+
+* W2: Creates the file (with O_CREAT and O_EXCL).
+
+* W3: Loads the blob into memory (inflating and delta reconstructing
+  it).
+
+* W4: Applies any required in-process filter, like end-of-line
+  conversion and re-encoding.
+
+* W5: Writes the result to the file descriptor opened at W2.
+
+* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
+  the result back to the main process, together with the end status of
+  the operation and the item's identification number.
+
+Note that, when possible, steps W3 to W5 are delegated to the streaming
+machinery, removing the need to keep the entire blob in memory.
+
+Also note that the workers *never* remove any file. As mentioned
+earlier, it is the responsibility of the main process to remove any file
+that blocks the checkout operation (or abort if removing the file(s)
+would cause data loss and the user didn't ask to `--force`). This is
+crucial to avoid race conditions and also to properly detect path
+collisions at Step W1.
+
+After the workers finish writing the items and sending back the required
+information, the main process handles the results in two steps:
+
+- First, it updates the in-memory index with the `lstat()` information
+  sent by the workers. (This must be done first as this information
+  might me required in the following step.)
+
+- Then it writes the items which collided on disk (i.e. items marked
+  with `PC_ITEM_COLLIDED`). More on this below.
+
+Path Collisions
+---------------
+
+Path collisions happen when two different paths correspond to the same
+entry in the file system. E.g. the paths 'a' and 'A' would collide in a
+case-insensitive file system.
+
+The sequential checkout deals with collisions in the same way that it
+deals with files that were already present in the working tree before
+checkout. Basically, it checks if the path that it wants to write
+already exists on disk, makes sure the existing file doesn't have
+unsaved data, and then overwrites it. (To be more pedantic: it deletes
+the existing file and creates the new one.) So, if there are multiple
+colliding files to be checked out, the sequential code will write each
+one of them but only the last will actually survive on disk.
+
+Parallel checkout aims to reproduce the same behavior. However, we
+cannot let the workers racily write to the same file on disk. Instead,
+the workers detect when the entry that they want to check out would
+collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
+Later, the main process can sequentially feed these entries back to
+`checkout_entry()` without the risk of race conditions. On clone, this
+also has the effect of marking the colliding entries to later emit a
+warning for the user, like the classic sequential checkout does.
+
+The workers are able to detect both collisions among the entries being
+concurrently written and collisions among parallel-eligible and
+ineligible entries. The general idea for collision detection is quite
+straightforward: for each parallel-eligible entry, the main process must
+remove all files that prevent this entry from being written (before
+enqueueing it). This includes any non-directory file in the leading path
+of the entry. Later, when a worker gets assigned the entry, it looks
+again for the non-directories files and for an already existing file at
+the entry's path. If any of these checks finds something, the worker
+knows that there was a path collision.
+
+Because parallel checkout can distinguish path collisions from the case
+where the file was already present in the working tree before checkout,
+we could alternatively choose to skip the checkout of colliding entries.
+However, each entry that doesn't get written would have NULL `lstat()`
+fields on the index. This could cause performance penalties for
+subsequent commands that need to refresh the index, as they would have
+to go to the file system to see if the entry is dirty. Thus, if we have
+N entries in a colliding group and we decide to write and `lstat()` only
+one of them, every subsequent `git-status` will have to read, convert,
+and hash the written file N - 1 times. By checking out all colliding
+entries (like the sequential code does), we only pay the overhead once,
+during checkout.
+
+Eligible Entries for Parallel Checkout
+--------------------------------------
+
+As previously mentioned, not all entries passed to `checkout_entry()`
+will be considered eligible for parallel checkout. More specifically, we
+exclude:
+
+- Symbolic links; to avoid race conditions that, in combination with
+  path collisions, could cause workers to write files at the wrong
+  place. For example, if we were to concurrently check out a symlink
+  'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
+  we could potentially end up writing the file 'A/f' at 'a/f', due to a
+  race condition.
+
+- Regular files that require external filters (either "one shot" filters
+  or long-running process filters). These filters are black-boxes to Git
+  and may have their own internal locking or non-concurrent assumptions.
+  So it might not be safe to run multiple instances in parallel.
++
+Besides, long-running filters may use the delayed checkout feature to
+postpone the return of some filtered blobs. The delayed checkout queue
+and the parallel checkout queue are not compatible and should remain
+separated.
++
+Note: regular files that only require internal filters, like end-of-line
+conversion and re-encoding, are eligible for parallel checkout.
+
+Ineligible entries are checked out by the classic sequential codepath
+*before* spawning workers.
+
+Note: submodules's files are also eligible for parallel checkout (as
+long as they don't fall into any of the excluding categories mentioned
+above). But since each submodule is checked out in its own child
+process, we don't mix the superproject's and the submodules' files in
+the same parallel checkout process or queue.
+
+The API
+-------
+
+The parallel checkout API was designed with the goal to minimize changes
+to the current users of the checkout machinery. This means that they
+don't have to call a different function for sequential or parallel
+checkout. As already mentioned, `checkout_entry()` will automatically
+insert the given entry in the parallel checkout queue when this feature
+is enabled and the entry is eligible; otherwise, it will just write the
+entry right away, using the sequential code. In general, callers of the
+parallel checkout API should look similar to this:
+
+----------------------------------------------
+int pc_workers, pc_threshold, err = 0;
+struct checkout state;
+
+get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
+/*
+ * This check is not strictly required, but it
+ * should save some time in sequential mode.
+ */
+if (pc_workers > 1)
+	init_parallel_checkout();
+
+for (each cache_entry ce to-be-updated)
+	err |= checkout_entry(ce, &state, NULL, NULL);
+
+err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL);
+----------------------------------------------
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] Parallel Checkout (part 2)
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
                     ` (4 preceding siblings ...)
  2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
@ 2021-04-08 19:52   ` Junio C Hamano
  2021-04-16 21:43   ` Junio C Hamano
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
  7 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2021-04-08 19:52 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, christian.couder, git

Matheus Tavares <matheus.bernardino@usp.br> writes:

> This is the next step in the parallel checkout implementation. As
> mt/parallel-checkout-part-1 is now on master, this round is based
> directly on master.

Thanks, will replace.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] Parallel Checkout (part 2)
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
                     ` (5 preceding siblings ...)
  2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
@ 2021-04-16 21:43   ` Junio C Hamano
  2021-04-17 19:57     ` Matheus Tavares Bernardino
  2021-04-19  9:41     ` Christian Couder
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
  7 siblings, 2 replies; 40+ messages in thread
From: Junio C Hamano @ 2021-04-16 21:43 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, christian.couder, git

Matheus Tavares <matheus.bernardino@usp.br> writes:

> This is the next step in the parallel checkout implementation. As
> mt/parallel-checkout-part-1 is now on master, this round is based
> directly on master.
>
> Changes since v1:

Now, I do not think we've seen any response to these messages.

It seems that review comments for the earlier round cf.
<CAP8UFD1stvx=2hBWyxmu75SXRzX-bHBfGr2jxWKgHdc85cfxRg@mail.gmail.com>

have been addressed?  Should we merge it down to 'next' now?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] Parallel Checkout (part 2)
  2021-04-16 21:43   ` Junio C Hamano
@ 2021-04-17 19:57     ` Matheus Tavares Bernardino
  2021-04-19  9:41     ` Christian Couder
  1 sibling, 0 replies; 40+ messages in thread
From: Matheus Tavares Bernardino @ 2021-04-17 19:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jeff Hostetler

On Fri, Apr 16, 2021 at 6:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > This is the next step in the parallel checkout implementation. As
> > mt/parallel-checkout-part-1 is now on master, this round is based
> > directly on master.
> >
> > Changes since v1:
>
> Now, I do not think we've seen any response to these messages.
>
> It seems that review comments for the earlier round cf.
> <CAP8UFD1stvx=2hBWyxmu75SXRzX-bHBfGr2jxWKgHdc85cfxRg@mail.gmail.com>
>
> have been addressed?  Should we merge it down to 'next' now?

Yes, I think I've addressed all the comments from the previous round.
There is just a minor change that I made locally yesterday, so I'll
reroll the series with it now. Nonetheless, the change is quite small
and trivial, so I think the rerolled version can be merged to 'next'.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 0/5] Parallel Checkout (part 2)
  2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
                     ` (6 preceding siblings ...)
  2021-04-16 21:43   ` Junio C Hamano
@ 2021-04-19  0:14   ` Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
                       ` (5 more replies)
  7 siblings, 6 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19  0:14 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Three small changes since the last version: 

- In `write_pc_item_to_fd()`, renamed variable 'new_blob' to just 'blob'.

- In `write_pc_item_to_fd()`, used `ce->name` instead of `path` when
  reporting failure to read the object.

- Made `write_pc_item()` unlink() the file on a `write_pc_item_to_fd()`
  error. The reasoning for this is:

  Unlike sequential checkout, the parallel workers create each file
  *before* reading and converting the associated blob, so that they can
  detect path collisions as soon as possible in the process. This is
  important both to avoid unnecessary work reading and filtering
  objects that are not going to be written in parallel, and also to
  avoid reporting errors for colliding entries at the wrong time. These
  entries will be handled in a sequential phase later, and that will be
  the correct time to print any errors regarding their checkout.

  However, these logistics make us leave an empty file behind when we
  find an error after creating the file, e.g. a missing object. The
  added unlink() fix this case while maintaining the other important
  mechanics. The design doc was also adjusted to mention this.

Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 271 ++++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--worker.c                    | 145 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 655 ++++++++++++++++++
 parallel-checkout.h                           | 111 +++
 unpack-trees.c                                |  19 +-
 12 files changed, 1241 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--worker.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

Range-diff against v2:
1:  0506ac7159 ! 1:  7096822c14 unpack-trees: add basic support for parallel checkout
    @@ parallel-checkout.c (new)
     +	int ret;
     +	struct stream_filter *filter;
     +	struct strbuf buf = STRBUF_INIT;
    -+	char *new_blob;
    ++	char *blob;
     +	unsigned long size;
     +	ssize_t wrote;
     +
    @@ parallel-checkout.c (new)
     +		}
     +	}
     +
    -+	new_blob = read_blob_entry(pc_item->ce, &size);
    -+	if (!new_blob)
    ++	blob = read_blob_entry(pc_item->ce, &size);
    ++	if (!blob)
     +		return error("cannot read object %s '%s'",
    -+			     oid_to_hex(&pc_item->ce->oid), path);
    ++			     oid_to_hex(&pc_item->ce->oid), pc_item->ce->name);
     +
     +	/*
     +	 * checkout metadata is used to give context for external process
    @@ parallel-checkout.c (new)
     +	 * checkout, so pass NULL.
     +	 */
     +	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
    -+					 new_blob, size, &buf, NULL);
    ++					 blob, size, &buf, NULL);
     +
     +	if (ret) {
     +		size_t newsize;
    -+		free(new_blob);
    -+		new_blob = strbuf_detach(&buf, &newsize);
    ++		free(blob);
    ++		blob = strbuf_detach(&buf, &newsize);
     +		size = newsize;
     +	}
     +
    -+	wrote = write_in_full(fd, new_blob, size);
    -+	free(new_blob);
    ++	wrote = write_in_full(fd, blob, size);
    ++	free(blob);
     +	if (wrote < 0)
     +		return error("unable to write file '%s'", path);
     +
    @@ parallel-checkout.c (new)
     +	if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
     +		/* Error was already reported. */
     +		pc_item->status = PC_ITEM_FAILED;
    ++		close_and_clear(&fd);
    ++		unlink(path.buf);
     +		goto out;
     +	}
     +
    @@ parallel-checkout.c (new)
     +	pc_item->status = PC_ITEM_WRITTEN;
     +
     +out:
    -+	/*
    -+	 * No need to check close() return at this point. Either fd is already
    -+	 * closed, or we are on an error path.
    -+	 */
    -+	close_and_clear(&fd);
     +	strbuf_release(&path);
     +}
     +
2:  0d65b517bd ! 2:  4526516ea0 parallel-checkout: make it truly parallel
    @@ parallel-checkout.c: static int write_pc_item_to_fd(struct parallel_checkout_ite
     +	 * be passed from the main process to the workers.
      	 */
      	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
    - 					 new_blob, size, &buf, NULL);
    + 					 blob, size, &buf, NULL);
     @@ parallel-checkout.c: static int close_and_clear(int *fd)
      	return ret;
      }
3:  6ea057f9c5 = 3:  ad165c0637 parallel-checkout: add configuration options
4:  0ac4bee67e = 4:  cf9e28dc0e parallel-checkout: support progress displaying
5:  087f8bdf35 ! 5:  415d4114aa parallel-checkout: add design documentation
    @@ Documentation/technical/parallel-checkout.txt (new)
     +Note that, when possible, steps W3 to W5 are delegated to the streaming
     +machinery, removing the need to keep the entire blob in memory.
     +
    -+Also note that the workers *never* remove any file. As mentioned
    -+earlier, it is the responsibility of the main process to remove any file
    -+that blocks the checkout operation (or abort if removing the file(s)
    -+would cause data loss and the user didn't ask to `--force`). This is
    -+crucial to avoid race conditions and also to properly detect path
    -+collisions at Step W1.
    ++If the worker fails to read the blob or to write it to the working tree,
    ++it removes the created file to avoid leaving empty files behind. This is
    ++the *only* time a worker is allowed to remove a file.
    ++
    ++As mentioned earlier, it is the responsibility of the main process to
    ++remove any file that blocks the checkout operation (or abort if the
    ++removal(s) would cause data loss and the user didn't ask to `--force`).
    ++This is crucial to avoid race conditions and also to properly detect
    ++path collisions at Step W1.
     +
     +After the workers finish writing the items and sending back the required
     +information, the main process handles the results in two steps:
-- 
2.30.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
@ 2021-04-19  0:14     ` Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19  0:14 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.

Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:

- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
  detects it by looking for EEXIST and EISDIR errors after an
  open(O_CREAT | O_EXCL) failure.

- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
  at the has_dirs_only_path() check, which is done for the leading path
  of each item in the parallel checkout queue.

Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).

After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile            |   1 +
 entry.c             |  17 ++-
 parallel-checkout.c | 365 ++++++++++++++++++++++++++++++++++++++++++++
 parallel-checkout.h |  32 ++++
 unpack-trees.c      |   6 +-
 5 files changed, 418 insertions(+), 3 deletions(-)
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

diff --git a/Makefile b/Makefile
index a6a73c5741..99734cc2ea 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pager.o
+LIB_OBJS += parallel-checkout.o
 LIB_OBJS += parse-options-cb.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += patch-delta.o
diff --git a/entry.c b/entry.c
index 2dc94ba5cc..d7ed38aa40 100644
--- a/entry.c
+++ b/entry.c
@@ -7,6 +7,7 @@
 #include "progress.h"
 #include "fsmonitor.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-		if (dup == ce)
-			break;
+		if (dup == ce) {
+			/*
+			 * Parallel checkout doesn't create the files in index
+			 * order. So the other side of the collision may appear
+			 * after the given cache_entry in the array.
+			 */
+			if (parallel_checkout_status() == PC_RUNNING)
+				continue;
+			else
+				break;
+		}
 
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
@@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
 		ca = &ca_buf;
 	}
 
+	if (!enqueue_checkout(ce, ca))
+		return 0;
+
 	return write_entry(ce, path.buf, ca, state, 0);
 }
 
diff --git a/parallel-checkout.c b/parallel-checkout.c
new file mode 100644
index 0000000000..590e2a3046
--- /dev/null
+++ b/parallel-checkout.c
@@ -0,0 +1,365 @@
+#include "cache.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "streaming.h"
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/* pointer to a istate->cache[] entry. Not owned by us. */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	struct stat st;
+	enum pc_item_status status;
+};
+
+struct parallel_checkout {
+	enum pc_status status;
+	struct parallel_checkout_item *items; /* The parallel checkout queue. */
+	size_t nr, alloc;
+};
+
+static struct parallel_checkout parallel_checkout;
+
+enum pc_status parallel_checkout_status(void)
+{
+	return parallel_checkout.status;
+}
+
+void init_parallel_checkout(void)
+{
+	if (parallel_checkout.status != PC_UNINITIALIZED)
+		BUG("parallel checkout already initialized");
+
+	parallel_checkout.status = PC_ACCEPTING_ENTRIES;
+}
+
+static void finish_parallel_checkout(void)
+{
+	if (parallel_checkout.status == PC_UNINITIALIZED)
+		BUG("cannot finish parallel checkout: not initialized yet");
+
+	free(parallel_checkout.items);
+	memset(&parallel_checkout, 0, sizeof(parallel_checkout));
+}
+
+static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
+					     const struct conv_attrs *ca)
+{
+	enum conv_attrs_classification c;
+
+	/*
+	 * Symlinks cannot be checked out in parallel as, in case of path
+	 * collision, they could racily replace leading directories of other
+	 * entries being checked out. Submodules are checked out in child
+	 * processes, which have their own parallel checkout queues.
+	 */
+	if (!S_ISREG(ce->ce_mode))
+		return 0;
+
+	c = classify_conv_attrs(ca);
+	switch (c) {
+	case CA_CLASS_INCORE:
+		return 1;
+
+	case CA_CLASS_INCORE_FILTER:
+		/*
+		 * It would be safe to allow concurrent instances of
+		 * single-file smudge filters, like rot13, but we should not
+		 * assume that all filters are parallel-process safe. So we
+		 * don't allow this.
+		 */
+		return 0;
+
+	case CA_CLASS_INCORE_PROCESS:
+		/*
+		 * The parallel queue and the delayed queue are not compatible,
+		 * so they must be kept completely separated. And we can't tell
+		 * if a long-running process will delay its response without
+		 * actually asking it to perform the filtering. Therefore, this
+		 * type of filter is not allowed in parallel checkout.
+		 *
+		 * Furthermore, there should only be one instance of the
+		 * long-running process filter as we don't know how it is
+		 * managing its own concurrency. So, spreading the entries that
+		 * requisite such a filter among the parallel workers would
+		 * require a lot more inter-process communication. We would
+		 * probably have to designate a single process to interact with
+		 * the filter and send all the necessary data to it, for each
+		 * entry.
+		 */
+		return 0;
+
+	case CA_CLASS_STREAMABLE:
+		return 1;
+
+	default:
+		BUG("unsupported conv_attrs classification '%d'", c);
+	}
+}
+
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
+{
+	struct parallel_checkout_item *pc_item;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES ||
+	    !is_eligible_for_parallel_checkout(ce, ca))
+		return -1;
+
+	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
+		   parallel_checkout.alloc);
+
+	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item->ce = ce;
+	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
+	pc_item->status = PC_ITEM_PENDING;
+
+	return 0;
+}
+
+static int handle_results(struct checkout *state)
+{
+	int ret = 0;
+	size_t i;
+	int have_pending = 0;
+
+	/*
+	 * We first update the successfully written entries with the collected
+	 * stat() data, so that they can be found by mark_colliding_entries(),
+	 * in the next loop, when necessary.
+	 */
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		if (pc_item->status == PC_ITEM_WRITTEN)
+			update_ce_after_write(state, pc_item->ce, &pc_item->st);
+	}
+
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+
+		switch(pc_item->status) {
+		case PC_ITEM_WRITTEN:
+			/* Already handled */
+			break;
+		case PC_ITEM_COLLIDED:
+			/*
+			 * The entry could not be checked out due to a path
+			 * collision with another entry. Since there can only
+			 * be one entry of each colliding group on the disk, we
+			 * could skip trying to check out this one and move on.
+			 * However, this would leave the unwritten entries with
+			 * null stat() fields on the index, which could
+			 * potentially slow down subsequent operations that
+			 * require refreshing it: git would not be able to
+			 * trust st_size and would have to go to the filesystem
+			 * to see if the contents match (see ie_modified()).
+			 *
+			 * Instead, let's pay the overhead only once, now, and
+			 * call checkout_entry_ca() again for this file, to
+			 * have its stat() data stored in the index. This also
+			 * has the benefit of adding this entry and its
+			 * colliding pair to the collision report message.
+			 * Additionally, this overwriting behavior is consistent
+			 * with what the sequential checkout does, so it doesn't
+			 * add any extra overhead.
+			 */
+			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
+						 state, NULL, NULL);
+			break;
+		case PC_ITEM_PENDING:
+			have_pending = 1;
+			/* fall through */
+		case PC_ITEM_FAILED:
+			ret = -1;
+			break;
+		default:
+			BUG("unknown checkout item status in parallel checkout");
+		}
+	}
+
+	if (have_pending)
+		error("parallel checkout finished with pending entries");
+
+	return ret;
+}
+
+static int reset_fd(int fd, const char *path)
+{
+	if (lseek(fd, 0, SEEK_SET) != 0)
+		return error_errno("failed to rewind descriptor of '%s'", path);
+	if (ftruncate(fd, 0))
+		return error_errno("failed to truncate file '%s'", path);
+	return 0;
+}
+
+static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
+			       const char *path)
+{
+	int ret;
+	struct stream_filter *filter;
+	struct strbuf buf = STRBUF_INIT;
+	char *blob;
+	unsigned long size;
+	ssize_t wrote;
+
+	/* Sanity check */
+	assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
+
+	filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
+	if (filter) {
+		if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
+			/* On error, reset fd to try writing without streaming */
+			if (reset_fd(fd, path))
+				return -1;
+		} else {
+			return 0;
+		}
+	}
+
+	blob = read_blob_entry(pc_item->ce, &size);
+	if (!blob)
+		return error("cannot read object %s '%s'",
+			     oid_to_hex(&pc_item->ce->oid), pc_item->ce->name);
+
+	/*
+	 * checkout metadata is used to give context for external process
+	 * filters. Files requiring such filters are not eligible for parallel
+	 * checkout, so pass NULL.
+	 */
+	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
+					 blob, size, &buf, NULL);
+
+	if (ret) {
+		size_t newsize;
+		free(blob);
+		blob = strbuf_detach(&buf, &newsize);
+		size = newsize;
+	}
+
+	wrote = write_in_full(fd, blob, size);
+	free(blob);
+	if (wrote < 0)
+		return error("unable to write file '%s'", path);
+
+	return 0;
+}
+
+static int close_and_clear(int *fd)
+{
+	int ret = 0;
+
+	if (*fd >= 0) {
+		ret = close(*fd);
+		*fd = -1;
+	}
+
+	return ret;
+}
+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+			  struct checkout *state)
+{
+	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+	int fd = -1, fstat_done = 0;
+	struct strbuf path = STRBUF_INIT;
+	const char *dir_sep;
+
+	strbuf_add(&path, state->base_dir, state->base_dir_len);
+	strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
+	dir_sep = find_last_dir_sep(path.buf);
+
+	/*
+	 * The leading dirs should have been already created by now. But, in
+	 * case of path collisions, one of the dirs could have been replaced by
+	 * a symlink (checked out after we enqueued this entry for parallel
+	 * checkout). Thus, we must check the leading dirs again.
+	 */
+	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
+					   state->base_dir_len)) {
+		pc_item->status = PC_ITEM_COLLIDED;
+		goto out;
+	}
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode);
+
+	if (fd < 0) {
+		if (errno == EEXIST || errno == EISDIR) {
+			/*
+			 * Errors which probably represent a path collision.
+			 * Suppress the error message and mark the item to be
+			 * retried later, sequentially. ENOTDIR and ENOENT are
+			 * also interesting, but the above has_dirs_only_path()
+			 * call should have already caught these cases.
+			 */
+			pc_item->status = PC_ITEM_COLLIDED;
+		} else {
+			error_errno("failed to open file '%s'", path.buf);
+			pc_item->status = PC_ITEM_FAILED;
+		}
+		goto out;
+	}
+
+	if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
+		/* Error was already reported. */
+		pc_item->status = PC_ITEM_FAILED;
+		close_and_clear(&fd);
+		unlink(path.buf);
+		goto out;
+	}
+
+	fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
+
+	if (close_and_clear(&fd)) {
+		error_errno("unable to close file '%s'", path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
+		error_errno("unable to stat just-written file '%s'",  path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	pc_item->status = PC_ITEM_WRITTEN;
+
+out:
+	strbuf_release(&path);
+}
+
+static void write_items_sequentially(struct checkout *state)
+{
+	size_t i;
+
+	for (i = 0; i < parallel_checkout.nr; i++)
+		write_pc_item(&parallel_checkout.items[i], state);
+}
+
+int run_parallel_checkout(struct checkout *state)
+{
+	int ret;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
+		BUG("cannot run parallel checkout: uninitialized or already running");
+
+	parallel_checkout.status = PC_RUNNING;
+
+	write_items_sequentially(state);
+	ret = handle_results(state);
+
+	finish_parallel_checkout();
+	return ret;
+}
diff --git a/parallel-checkout.h b/parallel-checkout.h
new file mode 100644
index 0000000000..4ad2a519b3
--- /dev/null
+++ b/parallel-checkout.h
@@ -0,0 +1,32 @@
+#ifndef PARALLEL_CHECKOUT_H
+#define PARALLEL_CHECKOUT_H
+
+struct cache_entry;
+struct checkout;
+struct conv_attrs;
+
+enum pc_status {
+	PC_UNINITIALIZED = 0,
+	PC_ACCEPTING_ENTRIES,
+	PC_RUNNING,
+};
+
+enum pc_status parallel_checkout_status(void);
+
+/*
+ * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
+ * only when in the PC_UNINITIALIZED state.
+ */
+void init_parallel_checkout(void);
+
+/*
+ * Return -1 if parallel checkout is currently not accepting entries or if the
+ * entry is not eligible for parallel checkout. Otherwise, enqueue the entry
+ * for later write and return 0.
+ */
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+
+/* Write all the queued entries, returning 0 on success.*/
+int run_parallel_checkout(struct checkout *state);
+
+#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a1afbc1e4..f0430d458d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -441,7 +442,6 @@ static int check_updates(struct unpack_trees_options *o,
 	if (should_update_submodules())
 		load_gitmodules_file(index, &state);
 
-	enable_delayed_checkout(&state);
 	if (has_promisor_remote()) {
 		/*
 		 * Prefetch the objects that are to be checked out in the loop
@@ -464,6 +464,9 @@ static int check_updates(struct unpack_trees_options *o,
 					   to_fetch.oid, to_fetch.nr);
 		oid_array_clear(&to_fetch);
 	}
+
+	enable_delayed_checkout(&state);
+	init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -477,6 +480,7 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
+	errs |= run_parallel_checkout(&state);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 2/5] parallel-checkout: make it truly parallel
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2021-04-19  0:14     ` Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19  0:14 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Use multiple worker processes to distribute the queued entries and call
write_pc_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could affect
performance due to lock contention in the kernel. Work stealing (or any
other format of re-distribution) is not implemented yet.

The protocol between the main process and the workers is quite simple.
They exchange binary messages packed in pkt-line format, and use
PKT-FLUSH to mark the end of input (from both sides). The main process
starts the communication by sending N pkt-lines, each corresponding to
an item that needs to be written. These packets contain all the
necessary information to load, smudge, and write the blob associated
with each item. Then it waits for the worker to send back N pkt-lines
containing the results for each item. The resulting packet must contain:
the identification number of the item that it refers to, the status of
the operation, and the lstat() data gathered after writing the file (iff
the operation was successful).

For now, checkout always uses a hardcoded value of 2 workers, only to
demonstrate that the parallel checkout framework correctly divides and
writes the queued entries. The next patch will add user configurations
and define a more reasonable default, based on tests with the said
settings.

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 .gitignore                 |   1 +
 Makefile                   |   1 +
 builtin.h                  |   1 +
 builtin/checkout--worker.c | 145 ++++++++++++++++++
 git.c                      |   2 +
 parallel-checkout.c        | 300 +++++++++++++++++++++++++++++++++----
 parallel-checkout.h        |  73 ++++++++-
 7 files changed, 496 insertions(+), 27 deletions(-)
 create mode 100644 builtin/checkout--worker.c

diff --git a/.gitignore b/.gitignore
index 3dcdb6bb5a..96c794b1c7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@
 /git-check-mailmap
 /git-check-ref-format
 /git-checkout
+/git-checkout--worker
 /git-checkout-index
 /git-cherry
 /git-cherry-pick
diff --git a/Makefile b/Makefile
index 99734cc2ea..84e15f58a0 100644
--- a/Makefile
+++ b/Makefile
@@ -1062,6 +1062,7 @@ BUILTIN_OBJS += builtin/check-attr.o
 BUILTIN_OBJS += builtin/check-ignore.o
 BUILTIN_OBJS += builtin/check-mailmap.o
 BUILTIN_OBJS += builtin/check-ref-format.o
+BUILTIN_OBJS += builtin/checkout--worker.o
 BUILTIN_OBJS += builtin/checkout-index.o
 BUILTIN_OBJS += builtin/checkout.o
 BUILTIN_OBJS += builtin/clean.o
diff --git a/builtin.h b/builtin.h
index b6ce981b73..16ecd5586f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -123,6 +123,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix);
 int cmd_bundle(int argc, const char **argv, const char *prefix);
 int cmd_cat_file(int argc, const char **argv, const char *prefix);
 int cmd_checkout(int argc, const char **argv, const char *prefix);
+int cmd_checkout__worker(int argc, const char **argv, const char *prefix);
 int cmd_checkout_index(int argc, const char **argv, const char *prefix);
 int cmd_check_attr(int argc, const char **argv, const char *prefix);
 int cmd_check_ignore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout--worker.c b/builtin/checkout--worker.c
new file mode 100644
index 0000000000..31e0de2f7e
--- /dev/null
+++ b/builtin/checkout--worker.c
@@ -0,0 +1,145 @@
+#include "builtin.h"
+#include "config.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static void packet_to_pc_item(const char *buffer, int len,
+			      struct parallel_checkout_item *pc_item)
+{
+	const struct pc_item_fixed_portion *fixed_portion;
+	const char *variant;
+	char *encoding;
+
+	if (len < sizeof(struct pc_item_fixed_portion))
+		BUG("checkout worker received too short item (got %dB, exp %dB)",
+		    len, (int)sizeof(struct pc_item_fixed_portion));
+
+	fixed_portion = (struct pc_item_fixed_portion *)buffer;
+
+	if (len - sizeof(struct pc_item_fixed_portion) !=
+		fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
+		BUG("checkout worker received corrupted item");
+
+	variant = buffer + sizeof(struct pc_item_fixed_portion);
+
+	/*
+	 * Note: the main process uses zero length to communicate that the
+	 * encoding is NULL. There is no use case that requires sending an
+	 * actual empty string, since convert_attrs() never sets
+	 * ca.working_tree_enconding to "".
+	 */
+	if (fixed_portion->working_tree_encoding_len) {
+		encoding = xmemdupz(variant,
+				    fixed_portion->working_tree_encoding_len);
+		variant += fixed_portion->working_tree_encoding_len;
+	} else {
+		encoding = NULL;
+	}
+
+	memset(pc_item, 0, sizeof(*pc_item));
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce->ce_namelen = fixed_portion->name_len;
+	pc_item->ce->ce_mode = fixed_portion->ce_mode;
+	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
+	oidcpy(&pc_item->ce->oid, &fixed_portion->oid);
+
+	pc_item->id = fixed_portion->id;
+	pc_item->ca.crlf_action = fixed_portion->crlf_action;
+	pc_item->ca.ident = fixed_portion->ident;
+	pc_item->ca.working_tree_encoding = encoding;
+}
+
+static void report_result(struct parallel_checkout_item *pc_item)
+{
+	struct pc_item_result res;
+	size_t size;
+
+	res.id = pc_item->id;
+	res.status = pc_item->status;
+
+	if (pc_item->status == PC_ITEM_WRITTEN) {
+		res.st = pc_item->st;
+		size = sizeof(res);
+	} else {
+		size = PC_ITEM_RESULT_BASE_SIZE;
+	}
+
+	packet_write(1, (const char *)&res, size);
+}
+
+/* Free the worker-side malloced data, but not pc_item itself. */
+static void release_pc_item_data(struct parallel_checkout_item *pc_item)
+{
+	free((char *)pc_item->ca.working_tree_encoding);
+	discard_cache_entry(pc_item->ce);
+}
+
+static void worker_loop(struct checkout *state)
+{
+	struct parallel_checkout_item *items = NULL;
+	size_t i, nr = 0, alloc = 0;
+
+	while (1) {
+		int len = packet_read(0, NULL, NULL, packet_buffer,
+				      sizeof(packet_buffer), 0);
+
+		if (len < 0)
+			BUG("packet_read() returned negative value");
+		else if (!len)
+			break;
+
+		ALLOC_GROW(items, nr + 1, alloc);
+		packet_to_pc_item(packet_buffer, len, &items[nr++]);
+	}
+
+	for (i = 0; i < nr; i++) {
+		struct parallel_checkout_item *pc_item = &items[i];
+		write_pc_item(pc_item, state);
+		report_result(pc_item);
+		release_pc_item_data(pc_item);
+	}
+
+	packet_flush(1);
+
+	free(items);
+}
+
+static const char * const checkout_worker_usage[] = {
+	N_("git checkout--worker [<options>]"),
+	NULL
+};
+
+int cmd_checkout__worker(int argc, const char **argv, const char *prefix)
+{
+	struct checkout state = CHECKOUT_INIT;
+	struct option checkout_worker_options[] = {
+		OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
+			N_("when creating files, prepend <string>")),
+		OPT_END()
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(checkout_worker_usage,
+				   checkout_worker_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix, checkout_worker_options,
+			     checkout_worker_usage, 0);
+	if (argc > 0)
+		usage_with_options(checkout_worker_usage, checkout_worker_options);
+
+	if (state.base_dir)
+		state.base_dir_len = strlen(state.base_dir);
+
+	/*
+	 * Setting this on a worker won't actually update the index. We just
+	 * need to tell the checkout machinery to lstat() the written entries,
+	 * so that we can send this data back to the main process.
+	 */
+	state.refresh_cache = 1;
+
+	worker_loop(&state);
+	return 0;
+}
diff --git a/git.c b/git.c
index 9bc077a025..532b5b4136 100644
--- a/git.c
+++ b/git.c
@@ -490,6 +490,8 @@ static struct cmd_struct commands[] = {
 	{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
 	{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT  },
 	{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+	{ "checkout--worker", cmd_checkout__worker,
+		RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
 	{ "checkout-index", cmd_checkout_index,
 		RUN_SETUP | NEED_WORK_TREE},
 	{ "cherry", cmd_cherry, RUN_SETUP },
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 590e2a3046..836154fec6 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,28 +1,14 @@
 #include "cache.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "pkt-line.h"
+#include "run-command.h"
+#include "sigchain.h"
 #include "streaming.h"
 
-enum pc_item_status {
-	PC_ITEM_PENDING = 0,
-	PC_ITEM_WRITTEN,
-	/*
-	 * The entry could not be written because there was another file
-	 * already present in its path or leading directories. Since
-	 * checkout_entry_ca() removes such files from the working tree before
-	 * enqueueing the entry for parallel checkout, it means that there was
-	 * a path collision among the entries being written.
-	 */
-	PC_ITEM_COLLIDED,
-	PC_ITEM_FAILED,
-};
-
-struct parallel_checkout_item {
-	/* pointer to a istate->cache[] entry. Not owned by us. */
-	struct cache_entry *ce;
-	struct conv_attrs ca;
-	struct stat st;
-	enum pc_item_status status;
+struct pc_worker {
+	struct child_process cp;
+	size_t next_item_to_complete, nr_items_to_complete;
 };
 
 struct parallel_checkout {
@@ -59,6 +45,7 @@ static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
 					     const struct conv_attrs *ca)
 {
 	enum conv_attrs_classification c;
+	size_t packed_item_size;
 
 	/*
 	 * Symlinks cannot be checked out in parallel as, in case of path
@@ -69,6 +56,18 @@ static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
 	if (!S_ISREG(ce->ce_mode))
 		return 0;
 
+	packed_item_size = sizeof(struct pc_item_fixed_portion) + ce->ce_namelen +
+		(ca->working_tree_encoding ? strlen(ca->working_tree_encoding) : 0);
+
+	/*
+	 * The amount of data we send to the workers per checkout item is
+	 * typically small (75~300B). So unless we find an insanely huge path
+	 * of 64KB, we should never reach the 65KB limit of one pkt-line. If
+	 * that does happen, we let the sequential code handle the item.
+	 */
+	if (packed_item_size > LARGE_PACKET_DATA_MAX)
+		return 0;
+
 	c = classify_conv_attrs(ca);
 	switch (c) {
 	case CA_CLASS_INCORE:
@@ -121,10 +120,12 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
 		   parallel_checkout.alloc);
 
-	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item = &parallel_checkout.items[parallel_checkout.nr];
 	pc_item->ce = ce;
 	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
 	pc_item->status = PC_ITEM_PENDING;
+	pc_item->id = parallel_checkout.nr;
+	parallel_checkout.nr++;
 
 	return 0;
 }
@@ -236,7 +237,8 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
 	/*
 	 * checkout metadata is used to give context for external process
 	 * filters. Files requiring such filters are not eligible for parallel
-	 * checkout, so pass NULL.
+	 * checkout, so pass NULL. Note: if that changes, the metadata must also
+	 * be passed from the main process to the workers.
 	 */
 	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
 					 blob, size, &buf, NULL);
@@ -268,8 +270,8 @@ static int close_and_clear(int *fd)
 	return ret;
 }
 
-static void write_pc_item(struct parallel_checkout_item *pc_item,
-			  struct checkout *state)
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state)
 {
 	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
 	int fd = -1, fstat_done = 0;
@@ -340,6 +342,240 @@ static void write_pc_item(struct parallel_checkout_item *pc_item,
 	strbuf_release(&path);
 }
 
+static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
+{
+	size_t len_data;
+	char *data, *variant;
+	struct pc_item_fixed_portion *fixed_portion;
+	const char *working_tree_encoding = pc_item->ca.working_tree_encoding;
+	size_t name_len = pc_item->ce->ce_namelen;
+	size_t working_tree_encoding_len = working_tree_encoding ?
+					   strlen(working_tree_encoding) : 0;
+
+	/*
+	 * Any changes in the calculation of the message size must also be made
+	 * in is_eligible_for_parallel_checkout().
+	 */
+	len_data = sizeof(struct pc_item_fixed_portion) + name_len +
+		   working_tree_encoding_len;
+
+	data = xcalloc(1, len_data);
+
+	fixed_portion = (struct pc_item_fixed_portion *)data;
+	fixed_portion->id = pc_item->id;
+	fixed_portion->ce_mode = pc_item->ce->ce_mode;
+	fixed_portion->crlf_action = pc_item->ca.crlf_action;
+	fixed_portion->ident = pc_item->ca.ident;
+	fixed_portion->name_len = name_len;
+	fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
+	/*
+	 * We use hashcpy() instead of oidcpy() because the hash[] positions
+	 * after `the_hash_algo->rawsz` might not be initialized. And Valgrind
+	 * would complain about passing uninitialized bytes to a syscall
+	 * (write(2)). There is no real harm in this case, but the warning could
+	 * hinder the detection of actual errors.
+	 */
+	hashcpy(fixed_portion->oid.hash, pc_item->ce->oid.hash);
+
+	variant = data + sizeof(*fixed_portion);
+	if (working_tree_encoding_len) {
+		memcpy(variant, working_tree_encoding, working_tree_encoding_len);
+		variant += working_tree_encoding_len;
+	}
+	memcpy(variant, pc_item->ce->name, name_len);
+
+	packet_write(fd, data, len_data);
+
+	free(data);
+}
+
+static void send_batch(int fd, size_t start, size_t nr)
+{
+	size_t i;
+	sigchain_push(SIGPIPE, SIG_IGN);
+	for (i = 0; i < nr; i++)
+		send_one_item(fd, &parallel_checkout.items[start + i]);
+	packet_flush(fd);
+	sigchain_pop(SIGPIPE);
+}
+
+static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
+{
+	struct pc_worker *workers;
+	int i, workers_with_one_extra_item;
+	size_t base_batch_size, batch_beginning = 0;
+
+	ALLOC_ARRAY(workers, num_workers);
+
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+
+		child_process_init(cp);
+		cp->git_cmd = 1;
+		cp->in = -1;
+		cp->out = -1;
+		cp->clean_on_exit = 1;
+		strvec_push(&cp->args, "checkout--worker");
+		if (state->base_dir_len)
+			strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
+		if (start_command(cp))
+			die("failed to spawn checkout worker");
+	}
+
+	base_batch_size = parallel_checkout.nr / num_workers;
+	workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
+	for (i = 0; i < num_workers; i++) {
+		struct pc_worker *worker = &workers[i];
+		size_t batch_size = base_batch_size;
+
+		/* distribute the extra work evenly */
+		if (i < workers_with_one_extra_item)
+			batch_size++;
+
+		send_batch(worker->cp.in, batch_beginning, batch_size);
+		worker->next_item_to_complete = batch_beginning;
+		worker->nr_items_to_complete = batch_size;
+
+		batch_beginning += batch_size;
+	}
+
+	return workers;
+}
+
+static void finish_workers(struct pc_worker *workers, int num_workers)
+{
+	int i;
+
+	/*
+	 * Close pipes before calling finish_command() to let the workers
+	 * exit asynchronously and avoid spending extra time on wait().
+	 */
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+		if (cp->in >= 0)
+			close(cp->in);
+		if (cp->out >= 0)
+			close(cp->out);
+	}
+
+	for (i = 0; i < num_workers; i++) {
+		int rc = finish_command(&workers[i].cp);
+		if (rc > 128) {
+			/*
+			 * For a normal non-zero exit, the worker should have
+			 * already printed something useful to stderr. But a
+			 * death by signal should be mentioned to the user.
+			 */
+			error("checkout worker %d died of signal %d", i, rc - 128);
+		}
+	}
+
+	free(workers);
+}
+
+static inline void assert_pc_item_result_size(int got, int exp)
+{
+	if (got != exp)
+		BUG("wrong result size from checkout worker (got %dB, exp %dB)",
+		    got, exp);
+}
+
+static void parse_and_save_result(const char *buffer, int len,
+				  struct pc_worker *worker)
+{
+	struct pc_item_result *res;
+	struct parallel_checkout_item *pc_item;
+	struct stat *st = NULL;
+
+	if (len < PC_ITEM_RESULT_BASE_SIZE)
+		BUG("too short result from checkout worker (got %dB, exp >=%dB)",
+		    len, (int)PC_ITEM_RESULT_BASE_SIZE);
+
+	res = (struct pc_item_result *)buffer;
+
+	/*
+	 * Worker should send either the full result struct on success, or
+	 * just the base (i.e. no stat data), otherwise.
+	 */
+	if (res->status == PC_ITEM_WRITTEN) {
+		assert_pc_item_result_size(len, (int)sizeof(struct pc_item_result));
+		st = &res->st;
+	} else {
+		assert_pc_item_result_size(len, (int)PC_ITEM_RESULT_BASE_SIZE);
+	}
+
+	if (!worker->nr_items_to_complete)
+		BUG("received result from supposedly finished checkout worker");
+	if (res->id != worker->next_item_to_complete)
+		BUG("unexpected item id from checkout worker (got %"PRIuMAX", exp %"PRIuMAX")",
+		    (uintmax_t)res->id, (uintmax_t)worker->next_item_to_complete);
+
+	worker->next_item_to_complete++;
+	worker->nr_items_to_complete--;
+
+	pc_item = &parallel_checkout.items[res->id];
+	pc_item->status = res->status;
+	if (st)
+		pc_item->st = *st;
+}
+
+static void gather_results_from_workers(struct pc_worker *workers,
+					int num_workers)
+{
+	int i, active_workers = num_workers;
+	struct pollfd *pfds;
+
+	CALLOC_ARRAY(pfds, num_workers);
+	for (i = 0; i < num_workers; i++) {
+		pfds[i].fd = workers[i].cp.out;
+		pfds[i].events = POLLIN;
+	}
+
+	while (active_workers) {
+		int nr = poll(pfds, num_workers, -1);
+
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("failed to poll checkout workers");
+		}
+
+		for (i = 0; i < num_workers && nr > 0; i++) {
+			struct pc_worker *worker = &workers[i];
+			struct pollfd *pfd = &pfds[i];
+
+			if (!pfd->revents)
+				continue;
+
+			if (pfd->revents & POLLIN) {
+				int len = packet_read(pfd->fd, NULL, NULL,
+						      packet_buffer,
+						      sizeof(packet_buffer), 0);
+
+				if (len < 0) {
+					BUG("packet_read() returned negative value");
+				} else if (!len) {
+					pfd->fd = -1;
+					active_workers--;
+				} else {
+					parse_and_save_result(packet_buffer,
+							      len, worker);
+				}
+			} else if (pfd->revents & POLLHUP) {
+				pfd->fd = -1;
+				active_workers--;
+			} else if (pfd->revents & (POLLNVAL | POLLERR)) {
+				die("error polling from checkout worker");
+			}
+
+			nr--;
+		}
+	}
+
+	free(pfds);
+}
+
 static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
@@ -348,16 +584,28 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
+static const int DEFAULT_NUM_WORKERS = 2;
+
 int run_parallel_checkout(struct checkout *state)
 {
-	int ret;
+	int ret, num_workers = DEFAULT_NUM_WORKERS;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
 
-	write_items_sequentially(state);
+	if (parallel_checkout.nr < num_workers)
+		num_workers = parallel_checkout.nr;
+
+	if (num_workers <= 1) {
+		write_items_sequentially(state);
+	} else {
+		struct pc_worker *workers = setup_workers(state, num_workers);
+		gather_results_from_workers(workers, num_workers);
+		finish_workers(workers, num_workers);
+	}
+
 	ret = handle_results(state);
 
 	finish_parallel_checkout();
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 4ad2a519b3..ec58716519 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -1,9 +1,14 @@
 #ifndef PARALLEL_CHECKOUT_H
 #define PARALLEL_CHECKOUT_H
 
+#include "convert.h"
+
 struct cache_entry;
 struct checkout;
-struct conv_attrs;
+
+/****************************************************************
+ * Users of parallel checkout
+ ****************************************************************/
 
 enum pc_status {
 	PC_UNINITIALIZED = 0,
@@ -29,4 +34,70 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 /* Write all the queued entries, returning 0 on success.*/
 int run_parallel_checkout(struct checkout *state);
 
+/****************************************************************
+ * Interface with checkout--worker
+ ****************************************************************/
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/*
+	 * In main process ce points to a istate->cache[] entry. Thus, it's not
+	 * owned by us. In workers they own the memory, which *must be* released.
+	 */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	size_t id; /* position in parallel_checkout.items[] of main process */
+
+	/* Output fields, sent from workers. */
+	enum pc_item_status status;
+	struct stat st;
+};
+
+/*
+ * The fixed-size portion of `struct parallel_checkout_item` that is sent to the
+ * workers. Following this will be 2 strings: ca.working_tree_encoding and
+ * ce.name; These are NOT null terminated, since we have the size in the fixed
+ * portion.
+ *
+ * Note that not all fields of conv_attrs and cache_entry are passed, only the
+ * ones that will be required by the workers to smudge and write the entry.
+ */
+struct pc_item_fixed_portion {
+	size_t id;
+	struct object_id oid;
+	unsigned int ce_mode;
+	enum convert_crlf_action crlf_action;
+	int ident;
+	size_t working_tree_encoding_len;
+	size_t name_len;
+};
+
+/*
+ * The fields of `struct parallel_checkout_item` that are returned by the
+ * workers. Note: `st` must be the last one, as it is omitted on error.
+ */
+struct pc_item_result {
+	size_t id;
+	enum pc_item_status status;
+	struct stat st;
+};
+
+#define PC_ITEM_RESULT_BASE_SIZE offsetof(struct pc_item_result, st)
+
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state);
+
 #endif /* PARALLEL_CHECKOUT_H */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 3/5] parallel-checkout: add configuration options
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
@ 2021-04-19  0:14     ` Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19  0:14 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.

To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.

Local SSD:
             Sequential             10 workers            Speedup
Clone        8.805 s ± 0.043 s      3.564 s ± 0.041 s     2.47 ± 0.03
Checkout I   9.678 s ± 0.057 s      4.486 s ± 0.050 s     2.16 ± 0.03
Checkout II  5.034 s ± 0.072 s      3.021 s ± 0.038 s     1.67 ± 0.03

Local HDD:
             Sequential             10 workers             Speedup
Clone        32.288 s ± 0.580 s     30.724 s ± 0.522 s    1.05 ± 0.03
Checkout I   54.172 s ±  7.119 s    54.429 s ± 6.738 s    1.00 ± 0.18
Checkout II  40.465 s ± 2.402 s     38.682 s ± 1.365 s    1.05 ± 0.07

Linux NFS server (v4.1, on EBS, single availability zone):

             Sequential             32 workers            Speedup
Clone        240.368 s ± 6.347 s    57.349 s ± 0.870 s    4.19 ± 0.13
Checkout I   242.862 s ± 2.215 s    58.700 s ± 0.904 s    4.14 ± 0.07
Checkout II  65.751 s ± 1.577 s     23.820 s ± 0.407 s    2.76 ± 0.08

EFS (v4.1, replicated over multiple availability zones):

             Sequential             32 workers            Speedup
Clone        922.321 s ± 2.274 s    210.453 s ± 3.412 s   4.38 ± 0.07
Checkout I   1011.300 s ± 7.346 s   297.828 s ± 0.964 s   3.40 ± 0.03
Checkout II  294.104 s ± 1.836 s    126.017 s ± 1.190 s   2.33 ± 0.03

The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.

To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:

             sequential fallback   10 workers           speedup
10   files    772.3 ms ± 12.6 ms   769.0 ms ± 13.6 ms   1.00 ± 0.02
20   files    780.5 ms ± 15.8 ms   775.2 ms ±  9.2 ms   1.01 ± 0.02
50   files    806.2 ms ± 13.8 ms   767.4 ms ±  8.5 ms   1.05 ± 0.02
100  files    833.7 ms ± 21.4 ms   750.5 ms ± 16.8 ms   1.11 ± 0.04
200  files    897.6 ms ± 30.9 ms   730.5 ms ± 14.7 ms   1.23 ± 0.05
500  files   1035.4 ms ± 48.0 ms   677.1 ms ± 22.3 ms   1.53 ± 0.09
1000 files   1244.6 ms ± 35.6 ms   654.0 ms ± 38.3 ms   1.90 ± 0.12
2000 files   1488.8 ms ± 53.4 ms   658.8 ms ± 23.8 ms   2.26 ± 0.12

From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.

Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.

About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/checkout.txt | 21 +++++++++++++++++++++
 parallel-checkout.c               | 24 +++++++++++++++++++-----
 parallel-checkout.h               |  9 +++++++--
 unpack-trees.c                    | 10 +++++++---
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt
index 2cddf7b4b4..bfbca90f0e 100644
--- a/Documentation/config/checkout.txt
+++ b/Documentation/config/checkout.txt
@@ -21,3 +21,24 @@ checkout.guess::
 	Provides the default value for the `--guess` or `--no-guess`
 	option in `git checkout` and `git switch`. See
 	linkgit:git-switch[1] and linkgit:git-checkout[1].
+
+checkout.workers::
+	The number of parallel workers to use when updating the working tree.
+	The default is one, i.e. sequential execution. If set to a value less
+	than one, Git will use as many workers as the number of logical cores
+	available. This setting and `checkout.thresholdForParallelism` affect
+	all commands that perform checkout. E.g. checkout, clone, reset,
+	sparse-checkout, etc.
++
+Note: parallel checkout usually delivers better performance for repositories
+located on SSDs or over NFS. For repositories on spinning disks and/or machines
+with a small number of cores, the default sequential checkout often performs
+better. The size and compression level of a repository might also influence how
+well the parallel version performs.
+
+checkout.thresholdForParallelism::
+	When running parallel checkout with a small number of files, the cost
+	of subprocess spawning and inter-process communication might outweigh
+	the parallelization gains. This setting allows to define the minimum
+	number of files for which parallel checkout should be attempted. The
+	default is 100.
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 836154fec6..3cc2028861 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,10 +1,12 @@
 #include "cache.h"
+#include "config.h"
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "streaming.h"
+#include "thread-utils.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -24,6 +26,20 @@ enum pc_status parallel_checkout_status(void)
 	return parallel_checkout.status;
 }
 
+static const int DEFAULT_THRESHOLD_FOR_PARALLELISM = 100;
+static const int DEFAULT_NUM_WORKERS = 1;
+
+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+	if (git_config_get_int("checkout.workers", num_workers))
+		*num_workers = DEFAULT_NUM_WORKERS;
+	else if (*num_workers < 1)
+		*num_workers = online_cpus();
+
+	if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+		*threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
 void init_parallel_checkout(void)
 {
 	if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -584,11 +600,9 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
-static const int DEFAULT_NUM_WORKERS = 2;
-
-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
 {
-	int ret, num_workers = DEFAULT_NUM_WORKERS;
+	int ret;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
@@ -598,7 +612,7 @@ int run_parallel_checkout(struct checkout *state)
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
 
-	if (num_workers <= 1) {
+	if (num_workers <= 1 || parallel_checkout.nr < threshold) {
 		write_items_sequentially(state);
 	} else {
 		struct pc_worker *workers = setup_workers(state, num_workers);
diff --git a/parallel-checkout.h b/parallel-checkout.h
index ec58716519..2a68ab954d 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -17,6 +17,7 @@ enum pc_status {
 };
 
 enum pc_status parallel_checkout_status(void);
+void get_parallel_checkout_configs(int *num_workers, int *threshold);
 
 /*
  * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
@@ -31,8 +32,12 @@ void init_parallel_checkout(void);
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 
-/* Write all the queued entries, returning 0 on success.*/
-int run_parallel_checkout(struct checkout *state);
+/*
+ * Write all the queued entries, returning 0 on success. If the number of
+ * entries is smaller than the specified threshold, the operation is performed
+ * sequentially.
+ */
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
 
 /****************************************************************
  * Interface with checkout--worker
diff --git a/unpack-trees.c b/unpack-trees.c
index f0430d458d..0669748f21 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
 	int errs = 0;
 	struct progress *progress;
 	struct checkout state = CHECKOUT_INIT;
-	int i;
+	int i, pc_workers, pc_threshold;
 
 	trace_performance_enter();
 	state.force = 1;
@@ -465,8 +465,11 @@ static int check_updates(struct unpack_trees_options *o,
 		oid_array_clear(&to_fetch);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
 	enable_delayed_checkout(&state);
-	init_parallel_checkout();
+	if (pc_workers > 1)
+		init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
-	errs |= run_parallel_checkout(&state);
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 4/5] parallel-checkout: support progress displaying
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
                       ` (2 preceding siblings ...)
  2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
@ 2021-04-19  0:14     ` Matheus Tavares
  2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
  5 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19  0:14 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c | 34 +++++++++++++++++++++++++++++++---
 parallel-checkout.h |  5 ++++-
 unpack-trees.c      | 11 ++++++++---
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/parallel-checkout.c b/parallel-checkout.c
index 3cc2028861..09e8b10a35 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -3,6 +3,7 @@
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
+#include "progress.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "streaming.h"
@@ -17,6 +18,8 @@ struct parallel_checkout {
 	enum pc_status status;
 	struct parallel_checkout_item *items; /* The parallel checkout queue. */
 	size_t nr, alloc;
+	struct progress *progress;
+	unsigned int *progress_cnt;
 };
 
 static struct parallel_checkout parallel_checkout;
@@ -146,6 +149,20 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	return 0;
 }
 
+size_t pc_queue_size(void)
+{
+	return parallel_checkout.nr;
+}
+
+static void advance_progress_meter(void)
+{
+	if (parallel_checkout.progress) {
+		(*parallel_checkout.progress_cnt)++;
+		display_progress(parallel_checkout.progress,
+				 *parallel_checkout.progress_cnt);
+	}
+}
+
 static int handle_results(struct checkout *state)
 {
 	int ret = 0;
@@ -194,6 +211,7 @@ static int handle_results(struct checkout *state)
 			 */
 			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
 						 state, NULL, NULL);
+			advance_progress_meter();
 			break;
 		case PC_ITEM_PENDING:
 			have_pending = 1;
@@ -534,6 +552,9 @@ static void parse_and_save_result(const char *buffer, int len,
 	pc_item->status = res->status;
 	if (st)
 		pc_item->st = *st;
+
+	if (res->status != PC_ITEM_COLLIDED)
+		advance_progress_meter();
 }
 
 static void gather_results_from_workers(struct pc_worker *workers,
@@ -596,11 +617,16 @@ static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
 
-	for (i = 0; i < parallel_checkout.nr; i++)
-		write_pc_item(&parallel_checkout.items[i], state);
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		write_pc_item(pc_item, state);
+		if (pc_item->status != PC_ITEM_COLLIDED)
+			advance_progress_meter();
+	}
 }
 
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt)
 {
 	int ret;
 
@@ -608,6 +634,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
+	parallel_checkout.progress = progress;
+	parallel_checkout.progress_cnt = progress_cnt;
 
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 2a68ab954d..80f539bcb7 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -5,6 +5,7 @@
 
 struct cache_entry;
 struct checkout;
+struct progress;
 
 /****************************************************************
  * Users of parallel checkout
@@ -31,13 +32,15 @@ void init_parallel_checkout(void);
  * for later write and return 0.
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+size_t pc_queue_size(void);
 
 /*
  * Write all the queued entries, returning 0 on success. If the number of
  * entries is smaller than the specified threshold, the operation is performed
  * sequentially.
  */
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt);
 
 /****************************************************************
  * Interface with checkout--worker
diff --git a/unpack-trees.c b/unpack-trees.c
index 0669748f21..4b77e52c6b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -474,17 +474,22 @@ static int check_updates(struct unpack_trees_options *o,
 		struct cache_entry *ce = index->cache[i];
 
 		if (ce->ce_flags & CE_UPDATE) {
+			size_t last_pc_queue_size = pc_queue_size();
+
 			if (ce->ce_flags & CE_WT_REMOVE)
 				BUG("both update and delete flags are set on %s",
 				    ce->name);
-			display_progress(progress, ++cnt);
 			ce->ce_flags &= ~CE_UPDATE;
 			errs |= checkout_entry(ce, &state, NULL, NULL);
+
+			if (last_pc_queue_size == pc_queue_size())
+				display_progress(progress, ++cnt);
 		}
 	}
-	stop_progress(&progress);
 	if (pc_workers > 1)
-		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      progress, &cnt);
+	stop_progress(&progress);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 5/5] parallel-checkout: add design documentation
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
                       ` (3 preceding siblings ...)
  2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
@ 2021-04-19  0:14     ` Matheus Tavares
  2021-04-19  9:36       ` Christian Couder
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
  5 siblings, 1 reply; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19  0:14 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/Makefile                        |   1 +
 Documentation/technical/parallel-checkout.txt | 271 ++++++++++++++++++
 2 files changed, 272 insertions(+)
 create mode 100644 Documentation/technical/parallel-checkout.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a04..af236927c9 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -90,6 +90,7 @@ TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
+TECH_DOCS += technical/parallel-checkout
 TECH_DOCS += technical/partial-clone
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt
new file mode 100644
index 0000000000..fddb0ed4fd
--- /dev/null
+++ b/Documentation/technical/parallel-checkout.txt
@@ -0,0 +1,271 @@
+Parallel Checkout Design Notes
+==============================
+
+The "Parallel Checkout" feature attempts to use multiple processes to
+parallelize the work of uncompressing the blobs, applying in-core
+filters, and writing the resulting contents to the working tree during a
+checkout operation. It can be used by all checkout-related commands,
+such as `clone`, `checkout`, `reset`, `sparse-checkout`, and others.
+
+These commands share the following basic structure:
+
+* Step 1: Read the current index file into memory.
+
+* Step 2: Modify the in-memory index based upon the command, and
+  temporarily mark all cache entries that need to be updated.
+
+* Step 3: Populate the working tree to match the new candidate index.
+  This includes iterating over all of the to-be-updated cache entries
+  and delete, create, or overwrite the associated files in the working
+  tree.
+
+* Step 4: Write the new index to disk.
+
+Step 3 is the focus of the "parallel checkout" effort described here.
+It dominates the execution time for most of the above command types.
+
+Sequential Implementation
+-------------------------
+
+For the purposes of discussion here, the current sequential
+implementation of Step 3 is divided in 3 parts, each one implemented in
+its own function:
+
+* Step 3a: `unpack-trees.c:check_updates()` contains a series of
+  sequential loops iterating over the `cache_entry`'s array. The main
+  loop in this function calls the Step 3b function for each of the
+  to-be-updated entries.
+
+* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
+  for file conflicts, collisions, and unsaved changes. It removes files
+  and creates leading directories as necessary. It calls the Step 3c
+  function for each entry to be written.
+
+* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
+  it if necessary, creates the file in the working tree, writes the
+  smudged contents, calls `fstat()` or `lstat()`, and updates the
+  associated `cache_entry` struct with the stat information gathered.
+
+It wouldn't be safe to perform Step 3b in parallel, as there could be
+race conditions between file creations and removals. Instead, the
+parallel checkout framework lets the sequential code handle Step 3b,
+and use parallel workers to replace the sequential
+`entry.c:write_entry()` calls from Step 3c.
+
+Rejected Multi-Threaded Solution
+--------------------------------
+
+The most "straightforward" implementation would be to spread the set of
+to-be-updated cache entries across multiple threads. But due to the
+thread-unsafe functions in the ODB code, we would have to use locks to
+coordinate the parallel operation. An early prototype of this solution
+showed that the multi-threaded checkout would bring performance
+improvements over the sequential code, but there was still too much lock
+contention. A `perf` profiling indicated that around 20% of the runtime
+during a local Linux clone (on an SSD) was spent in locking functions.
+For this reason this approach was rejected in favor of using multiple
+child processes, which led to a better performance.
+
+Multi-Process Solution
+----------------------
+
+Parallel checkout alters the aforementioned Step 3 to use multiple
+`checkout--worker` background processes to distribute the work. The
+long-running worker processes are controlled by the foreground Git
+command using the existing run-command API.
+
+Overview
+~~~~~~~~
+
+Step 3b is only slightly altered; for each entry to be checked out, the
+main process performs the following steps:
+
+* M1: Check whether there is any untracked or unclean file in the
+  working tree which would be overwritten by this entry, and decide
+  whether to proceed (removing the file(s)) or not.
+
+* M2: Create the leading directories.
+
+* M3: Load the conversion attributes for the entry's path.
+
+* M4: Check, based on the entry's type and conversion attributes,
+  whether the entry is eligible for parallel checkout (more on this
+  later). If it is eligible, enqueue the entry and the loaded
+  attributes to later write the entry in parallel. If not, write the
+  entry right away, using the default sequential code.
+
+Note: we save the conversion attributes associated with each entry
+because the workers don't have access to the main process' index state,
+so they can't load the attributes by themselves (and the attributes are
+needed to properly smudge the entry). Additionally, this has a positive
+impact on performance as (1) we don't need to load the attributes twice
+and (2) the attributes machinery is optimized to handle paths in
+sequential order.
+
+After all entries have passed through the above steps, the main process
+checks if the number of enqueued entries is sufficient to spread among
+the workers. If not, it just writes them sequentially. Otherwise, it
+spawns the workers and distributes the queued entries uniformly in
+continuous chunks. This aims to minimize the chances of two workers
+writing to the same directory simultaneously, which could increase lock
+contention in the kernel.
+
+Then, for each assigned item, each worker:
+
+* W1: Checks if there is any non-directory file in the leading part of
+  the entry's path or if there already exists a file at the entry' path.
+  If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
+  this later).
+
+* W2: Creates the file (with O_CREAT and O_EXCL).
+
+* W3: Loads the blob into memory (inflating and delta reconstructing
+  it).
+
+* W4: Applies any required in-process filter, like end-of-line
+  conversion and re-encoding.
+
+* W5: Writes the result to the file descriptor opened at W2.
+
+* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
+  the result back to the main process, together with the end status of
+  the operation and the item's identification number.
+
+Note that, when possible, steps W3 to W5 are delegated to the streaming
+machinery, removing the need to keep the entire blob in memory.
+
+If the worker fails to read the blob or to write it to the working tree,
+it removes the created file to avoid leaving empty files behind. This is
+the *only* time a worker is allowed to remove a file.
+
+As mentioned earlier, it is the responsibility of the main process to
+remove any file that blocks the checkout operation (or abort if the
+removal(s) would cause data loss and the user didn't ask to `--force`).
+This is crucial to avoid race conditions and also to properly detect
+path collisions at Step W1.
+
+After the workers finish writing the items and sending back the required
+information, the main process handles the results in two steps:
+
+- First, it updates the in-memory index with the `lstat()` information
+  sent by the workers. (This must be done first as this information
+  might me required in the following step.)
+
+- Then it writes the items which collided on disk (i.e. items marked
+  with `PC_ITEM_COLLIDED`). More on this below.
+
+Path Collisions
+---------------
+
+Path collisions happen when two different paths correspond to the same
+entry in the file system. E.g. the paths 'a' and 'A' would collide in a
+case-insensitive file system.
+
+The sequential checkout deals with collisions in the same way that it
+deals with files that were already present in the working tree before
+checkout. Basically, it checks if the path that it wants to write
+already exists on disk, makes sure the existing file doesn't have
+unsaved data, and then overwrites it. (To be more pedantic: it deletes
+the existing file and creates the new one.) So, if there are multiple
+colliding files to be checked out, the sequential code will write each
+one of them but only the last will actually survive on disk.
+
+Parallel checkout aims to reproduce the same behavior. However, we
+cannot let the workers racily write to the same file on disk. Instead,
+the workers detect when the entry that they want to check out would
+collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
+Later, the main process can sequentially feed these entries back to
+`checkout_entry()` without the risk of race conditions. On clone, this
+also has the effect of marking the colliding entries to later emit a
+warning for the user, like the classic sequential checkout does.
+
+The workers are able to detect both collisions among the entries being
+concurrently written and collisions among parallel-eligible and
+ineligible entries. The general idea for collision detection is quite
+straightforward: for each parallel-eligible entry, the main process must
+remove all files that prevent this entry from being written (before
+enqueueing it). This includes any non-directory file in the leading path
+of the entry. Later, when a worker gets assigned the entry, it looks
+again for the non-directories files and for an already existing file at
+the entry's path. If any of these checks finds something, the worker
+knows that there was a path collision.
+
+Because parallel checkout can distinguish path collisions from the case
+where the file was already present in the working tree before checkout,
+we could alternatively choose to skip the checkout of colliding entries.
+However, each entry that doesn't get written would have NULL `lstat()`
+fields on the index. This could cause performance penalties for
+subsequent commands that need to refresh the index, as they would have
+to go to the file system to see if the entry is dirty. Thus, if we have
+N entries in a colliding group and we decide to write and `lstat()` only
+one of them, every subsequent `git-status` will have to read, convert,
+and hash the written file N - 1 times. By checking out all colliding
+entries (like the sequential code does), we only pay the overhead once,
+during checkout.
+
+Eligible Entries for Parallel Checkout
+--------------------------------------
+
+As previously mentioned, not all entries passed to `checkout_entry()`
+will be considered eligible for parallel checkout. More specifically, we
+exclude:
+
+- Symbolic links; to avoid race conditions that, in combination with
+  path collisions, could cause workers to write files at the wrong
+  place. For example, if we were to concurrently check out a symlink
+  'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
+  we could potentially end up writing the file 'A/f' at 'a/f', due to a
+  race condition.
+
+- Regular files that require external filters (either "one shot" filters
+  or long-running process filters). These filters are black-boxes to Git
+  and may have their own internal locking or non-concurrent assumptions.
+  So it might not be safe to run multiple instances in parallel.
++
+Besides, long-running filters may use the delayed checkout feature to
+postpone the return of some filtered blobs. The delayed checkout queue
+and the parallel checkout queue are not compatible and should remain
+separated.
++
+Note: regular files that only require internal filters, like end-of-line
+conversion and re-encoding, are eligible for parallel checkout.
+
+Ineligible entries are checked out by the classic sequential codepath
+*before* spawning workers.
+
+Note: submodules's files are also eligible for parallel checkout (as
+long as they don't fall into any of the excluding categories mentioned
+above). But since each submodule is checked out in its own child
+process, we don't mix the superproject's and the submodules' files in
+the same parallel checkout process or queue.
+
+The API
+-------
+
+The parallel checkout API was designed with the goal to minimize changes
+to the current users of the checkout machinery. This means that they
+don't have to call a different function for sequential or parallel
+checkout. As already mentioned, `checkout_entry()` will automatically
+insert the given entry in the parallel checkout queue when this feature
+is enabled and the entry is eligible; otherwise, it will just write the
+entry right away, using the sequential code. In general, callers of the
+parallel checkout API should look similar to this:
+
+----------------------------------------------
+int pc_workers, pc_threshold, err = 0;
+struct checkout state;
+
+get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
+/*
+ * This check is not strictly required, but it
+ * should save some time in sequential mode.
+ */
+if (pc_workers > 1)
+	init_parallel_checkout();
+
+for (each cache_entry ce to-be-updated)
+	err |= checkout_entry(ce, &state, NULL, NULL);
+
+err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL);
+----------------------------------------------
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 5/5] parallel-checkout: add design documentation
  2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
@ 2021-04-19  9:36       ` Christian Couder
  0 siblings, 0 replies; 40+ messages in thread
From: Christian Couder @ 2021-04-19  9:36 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Junio C Hamano, git, Jeff Hostetler

On Mon, Apr 19, 2021 at 2:15 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  Documentation/Makefile                        |   1 +
>  Documentation/technical/parallel-checkout.txt | 271 ++++++++++++++++++
>  2 files changed, 272 insertions(+)
>  create mode 100644 Documentation/technical/parallel-checkout.txt
>
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 81d1bf7a04..af236927c9 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -90,6 +90,7 @@ TECH_DOCS += technical/multi-pack-index
>  TECH_DOCS += technical/pack-format
>  TECH_DOCS += technical/pack-heuristics
>  TECH_DOCS += technical/pack-protocol
> +TECH_DOCS += technical/parallel-checkout
>  TECH_DOCS += technical/partial-clone
>  TECH_DOCS += technical/protocol-capabilities
>  TECH_DOCS += technical/protocol-common
> diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt
> new file mode 100644
> index 0000000000..fddb0ed4fd
> --- /dev/null
> +++ b/Documentation/technical/parallel-checkout.txt
> @@ -0,0 +1,271 @@
> +Parallel Checkout Design Notes
> +==============================
> +
> +The "Parallel Checkout" feature attempts to use multiple processes to
> +parallelize the work of uncompressing the blobs, applying in-core
> +filters, and writing the resulting contents to the working tree during a
> +checkout operation. It can be used by all checkout-related commands,
> +such as `clone`, `checkout`, `reset`, `sparse-checkout`, and others.
> +
> +These commands share the following basic structure:
> +
> +* Step 1: Read the current index file into memory.
> +
> +* Step 2: Modify the in-memory index based upon the command, and
> +  temporarily mark all cache entries that need to be updated.
> +
> +* Step 3: Populate the working tree to match the new candidate index.
> +  This includes iterating over all of the to-be-updated cache entries
> +  and delete, create, or overwrite the associated files in the working
> +  tree.
> +
> +* Step 4: Write the new index to disk.
> +
> +Step 3 is the focus of the "parallel checkout" effort described here.
> +It dominates the execution time for most of the above command types.

Maybe: s/command types/checkout-related commands/

> +Sequential Implementation
> +-------------------------
> +
> +For the purposes of discussion here, the current sequential
> +implementation of Step 3 is divided in 3 parts, each one implemented in
> +its own function:
> +
> +* Step 3a: `unpack-trees.c:check_updates()` contains a series of
> +  sequential loops iterating over the `cache_entry`'s array. The main
> +  loop in this function calls the Step 3b function for each of the
> +  to-be-updated entries.
> +
> +* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
> +  for file conflicts, collisions, and unsaved changes. It removes files
> +  and creates leading directories as necessary. It calls the Step 3c
> +  function for each entry to be written.
> +
> +* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
> +  it if necessary, creates the file in the working tree, writes the
> +  smudged contents, calls `fstat()` or `lstat()`, and updates the
> +  associated `cache_entry` struct with the stat information gathered.
> +
> +It wouldn't be safe to perform Step 3b in parallel, as there could be
> +race conditions between file creations and removals. Instead, the
> +parallel checkout framework lets the sequential code handle Step 3b,
> +and use parallel workers to replace the sequential

s/use/uses/

> +`entry.c:write_entry()` calls from Step 3c.
> +
> +Rejected Multi-Threaded Solution
> +--------------------------------
> +
> +The most "straightforward" implementation would be to spread the set of
> +to-be-updated cache entries across multiple threads. But due to the
> +thread-unsafe functions in the ODB code, we would have to use locks to
> +coordinate the parallel operation. An early prototype of this solution
> +showed that the multi-threaded checkout would bring performance
> +improvements over the sequential code, but there was still too much lock
> +contention. A `perf` profiling indicated that around 20% of the runtime
> +during a local Linux clone (on an SSD) was spent in locking functions.
> +For this reason this approach was rejected in favor of using multiple
> +child processes, which led to a better performance.
> +
> +Multi-Process Solution
> +----------------------
> +
> +Parallel checkout alters the aforementioned Step 3 to use multiple
> +`checkout--worker` background processes to distribute the work. The
> +long-running worker processes are controlled by the foreground Git
> +command using the existing run-command API.
> +
> +Overview
> +~~~~~~~~
> +
> +Step 3b is only slightly altered; for each entry to be checked out, the
> +main process performs the following steps:
> +
> +* M1: Check whether there is any untracked or unclean file in the
> +  working tree which would be overwritten by this entry, and decide
> +  whether to proceed (removing the file(s)) or not.
> +
> +* M2: Create the leading directories.
> +
> +* M3: Load the conversion attributes for the entry's path.
> +
> +* M4: Check, based on the entry's type and conversion attributes,
> +  whether the entry is eligible for parallel checkout (more on this
> +  later). If it is eligible, enqueue the entry and the loaded
> +  attributes to later write the entry in parallel. If not, write the
> +  entry right away, using the default sequential code.
> +
> +Note: we save the conversion attributes associated with each entry
> +because the workers don't have access to the main process' index state,
> +so they can't load the attributes by themselves (and the attributes are
> +needed to properly smudge the entry). Additionally, this has a positive
> +impact on performance as (1) we don't need to load the attributes twice
> +and (2) the attributes machinery is optimized to handle paths in
> +sequential order.
> +
> +After all entries have passed through the above steps, the main process
> +checks if the number of enqueued entries is sufficient to spread among
> +the workers. If not, it just writes them sequentially. Otherwise, it
> +spawns the workers and distributes the queued entries uniformly in
> +continuous chunks. This aims to minimize the chances of two workers
> +writing to the same directory simultaneously, which could increase lock
> +contention in the kernel.
> +
> +Then, for each assigned item, each worker:
> +
> +* W1: Checks if there is any non-directory file in the leading part of
> +  the entry's path or if there already exists a file at the entry' path.
> +  If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
> +  this later).
> +
> +* W2: Creates the file (with O_CREAT and O_EXCL).
> +
> +* W3: Loads the blob into memory (inflating and delta reconstructing
> +  it).
> +
> +* W4: Applies any required in-process filter, like end-of-line
> +  conversion and re-encoding.
> +
> +* W5: Writes the result to the file descriptor opened at W2.
> +
> +* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
> +  the result back to the main process, together with the end status of
> +  the operation and the item's identification number.
> +
> +Note that, when possible, steps W3 to W5 are delegated to the streaming
> +machinery, removing the need to keep the entire blob in memory.
> +
> +If the worker fails to read the blob or to write it to the working tree,
> +it removes the created file to avoid leaving empty files behind. This is
> +the *only* time a worker is allowed to remove a file.
> +
> +As mentioned earlier, it is the responsibility of the main process to
> +remove any file that blocks the checkout operation (or abort if the
> +removal(s) would cause data loss and the user didn't ask to `--force`).
> +This is crucial to avoid race conditions and also to properly detect
> +path collisions at Step W1.
> +
> +After the workers finish writing the items and sending back the required
> +information, the main process handles the results in two steps:
> +
> +- First, it updates the in-memory index with the `lstat()` information
> +  sent by the workers. (This must be done first as this information
> +  might me required in the following step.)
> +
> +- Then it writes the items which collided on disk (i.e. items marked
> +  with `PC_ITEM_COLLIDED`). More on this below.
> +
> +Path Collisions
> +---------------
> +
> +Path collisions happen when two different paths correspond to the same
> +entry in the file system. E.g. the paths 'a' and 'A' would collide in a
> +case-insensitive file system.
> +
> +The sequential checkout deals with collisions in the same way that it
> +deals with files that were already present in the working tree before
> +checkout. Basically, it checks if the path that it wants to write
> +already exists on disk, makes sure the existing file doesn't have
> +unsaved data, and then overwrites it. (To be more pedantic: it deletes
> +the existing file and creates the new one.) So, if there are multiple
> +colliding files to be checked out, the sequential code will write each
> +one of them but only the last will actually survive on disk.
> +
> +Parallel checkout aims to reproduce the same behavior. However, we
> +cannot let the workers racily write to the same file on disk. Instead,
> +the workers detect when the entry that they want to check out would
> +collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
> +Later, the main process can sequentially feed these entries back to
> +`checkout_entry()` without the risk of race conditions. On clone, this
> +also has the effect of marking the colliding entries to later emit a
> +warning for the user, like the classic sequential checkout does.
> +
> +The workers are able to detect both collisions among the entries being
> +concurrently written and collisions among parallel-eligible and
> +ineligible entries.

I am not sure I understand the above correctly. Does this mean that if
there are 2 parallel-ineligible entries that collide the workers will
detect that? How is that possible if the parallel-ineligible entries
are not processed by the workers?

Or maybe s/among/between/ would make things clearer. For example:

"The workers are able to detect both collisions among entries being
concurrently written and collisions between a parallel-eligible entry and
one or more ineligible entries."

> The general idea for collision detection is quite
> +straightforward: for each parallel-eligible entry, the main process must
> +remove all files that prevent this entry from being written (before
> +enqueueing it). This includes any non-directory file in the leading path
> +of the entry. Later, when a worker gets assigned the entry, it looks
> +again for the non-directories files and for an already existing file at
> +the entry's path. If any of these checks finds something, the worker
> +knows that there was a path collision.
> +
> +Because parallel checkout can distinguish path collisions from the case
> +where the file was already present in the working tree before checkout,
> +we could alternatively choose to skip the checkout of colliding entries.
> +However, each entry that doesn't get written would have NULL `lstat()`
> +fields on the index. This could cause performance penalties for
> +subsequent commands that need to refresh the index, as they would have
> +to go to the file system to see if the entry is dirty. Thus, if we have
> +N entries in a colliding group and we decide to write and `lstat()` only
> +one of them, every subsequent `git-status` will have to read, convert,
> +and hash the written file N - 1 times. By checking out all colliding
> +entries (like the sequential code does), we only pay the overhead once,
> +during checkout.
> +
> +Eligible Entries for Parallel Checkout
> +--------------------------------------
> +
> +As previously mentioned, not all entries passed to `checkout_entry()`
> +will be considered eligible for parallel checkout. More specifically, we
> +exclude:
> +
> +- Symbolic links; to avoid race conditions that, in combination with
> +  path collisions, could cause workers to write files at the wrong
> +  place. For example, if we were to concurrently check out a symlink
> +  'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
> +  we could potentially end up writing the file 'A/f' at 'a/f', due to a
> +  race condition.
> +
> +- Regular files that require external filters (either "one shot" filters
> +  or long-running process filters). These filters are black-boxes to Git
> +  and may have their own internal locking or non-concurrent assumptions.
> +  So it might not be safe to run multiple instances in parallel.
> ++
> +Besides, long-running filters may use the delayed checkout feature to
> +postpone the return of some filtered blobs. The delayed checkout queue
> +and the parallel checkout queue are not compatible and should remain
> +separated.

Maybe: s/separated/separate/

> +Note: regular files that only require internal filters, like end-of-line
> +conversion and re-encoding, are eligible for parallel checkout.
> +
> +Ineligible entries are checked out by the classic sequential codepath
> +*before* spawning workers.
> +
> +Note: submodules's files are also eligible for parallel checkout (as
> +long as they don't fall into any of the excluding categories mentioned
> +above). But since each submodule is checked out in its own child
> +process, we don't mix the superproject's and the submodules' files in
> +the same parallel checkout process or queue.
> +
> +The API
> +-------
> +
> +The parallel checkout API was designed with the goal to minimize changes

s/to minimize/of minimizing/

> +to the current users of the checkout machinery. This means that they
> +don't have to call a different function for sequential or parallel
> +checkout. As already mentioned, `checkout_entry()` will automatically
> +insert the given entry in the parallel checkout queue when this feature
> +is enabled and the entry is eligible; otherwise, it will just write the
> +entry right away, using the sequential code. In general, callers of the
> +parallel checkout API should look similar to this:

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/5] Parallel Checkout (part 2)
  2021-04-16 21:43   ` Junio C Hamano
  2021-04-17 19:57     ` Matheus Tavares Bernardino
@ 2021-04-19  9:41     ` Christian Couder
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Couder @ 2021-04-19  9:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matheus Tavares, git, Jeff Hostetler

On Fri, Apr 16, 2021 at 11:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > This is the next step in the parallel checkout implementation. As
> > mt/parallel-checkout-part-1 is now on master, this round is based
> > directly on master.
> >
> > Changes since v1:
>
> Now, I do not think we've seen any response to these messages.
>
> It seems that review comments for the earlier round cf.
> <CAP8UFD1stvx=2hBWyxmu75SXRzX-bHBfGr2jxWKgHdc85cfxRg@mail.gmail.com>
>
> have been addressed?  Should we merge it down to 'next' now?

Yeah, my concerns in v1 were all addressed. I took another look at v3
and found only small nits in the documentation patch. For what it's
worth, with or without the small nits addressed, the v3 has my:

Reviewed-by: Christian Couder <chriscool@tuxfamily.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 0/5] Parallel Checkout (part 2)
  2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
                       ` (4 preceding siblings ...)
  2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
@ 2021-04-19 19:53     ` Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
                         ` (4 more replies)
  5 siblings, 5 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19 19:53 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

This version is almost identical to v3, but the last patch incorporates
the typo fixes and other rewording suggestions Christian made about the
design doc on the last round.

I decided to remove the sentence about step 3 dominating the execution
time as that's not always the case on e.g. a non-local clone or
sparse-checkout.

Matheus Tavares (5):
  unpack-trees: add basic support for parallel checkout
  parallel-checkout: make it truly parallel
  parallel-checkout: add configuration options
  parallel-checkout: support progress displaying
  parallel-checkout: add design documentation

 .gitignore                                    |   1 +
 Documentation/Makefile                        |   1 +
 Documentation/config/checkout.txt             |  21 +
 Documentation/technical/parallel-checkout.txt | 270 ++++++++
 Makefile                                      |   2 +
 builtin.h                                     |   1 +
 builtin/checkout--worker.c                    | 145 ++++
 entry.c                                       |  17 +-
 git.c                                         |   2 +
 parallel-checkout.c                           | 655 ++++++++++++++++++
 parallel-checkout.h                           | 111 +++
 unpack-trees.c                                |  19 +-
 12 files changed, 1240 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/technical/parallel-checkout.txt
 create mode 100644 builtin/checkout--worker.c
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

Range-diff against v3:
1:  7096822c14 = 1:  7096822c14 unpack-trees: add basic support for parallel checkout
2:  4526516ea0 = 2:  4526516ea0 parallel-checkout: make it truly parallel
3:  ad165c0637 = 3:  ad165c0637 parallel-checkout: add configuration options
4:  cf9e28dc0e = 4:  cf9e28dc0e parallel-checkout: support progress displaying
5:  415d4114aa ! 5:  fd929f072c parallel-checkout: add design documentation
    @@ Documentation/technical/parallel-checkout.txt (new)
     +* Step 4: Write the new index to disk.
     +
     +Step 3 is the focus of the "parallel checkout" effort described here.
    -+It dominates the execution time for most of the above command types.
     +
     +Sequential Implementation
     +-------------------------
    @@ Documentation/technical/parallel-checkout.txt (new)
     +It wouldn't be safe to perform Step 3b in parallel, as there could be
     +race conditions between file creations and removals. Instead, the
     +parallel checkout framework lets the sequential code handle Step 3b,
    -+and use parallel workers to replace the sequential
    ++and uses parallel workers to replace the sequential
     +`entry.c:write_entry()` calls from Step 3c.
     +
     +Rejected Multi-Threaded Solution
    @@ Documentation/technical/parallel-checkout.txt (new)
     +warning for the user, like the classic sequential checkout does.
     +
     +The workers are able to detect both collisions among the entries being
    -+concurrently written and collisions among parallel-eligible and
    -+ineligible entries. The general idea for collision detection is quite
    -+straightforward: for each parallel-eligible entry, the main process must
    -+remove all files that prevent this entry from being written (before
    -+enqueueing it). This includes any non-directory file in the leading path
    -+of the entry. Later, when a worker gets assigned the entry, it looks
    -+again for the non-directories files and for an already existing file at
    -+the entry's path. If any of these checks finds something, the worker
    -+knows that there was a path collision.
    ++concurrently written and collisions between a parallel-eligible entry
    ++and an ineligible entry. The general idea for collision detection is
    ++quite straightforward: for each parallel-eligible entry, the main
    ++process must remove all files that prevent this entry from being written
    ++(before enqueueing it). This includes any non-directory file in the
    ++leading path of the entry. Later, when a worker gets assigned the entry,
    ++it looks again for the non-directories files and for an already existing
    ++file at the entry's path. If any of these checks finds something, the
    ++worker knows that there was a path collision.
     +
     +Because parallel checkout can distinguish path collisions from the case
     +where the file was already present in the working tree before checkout,
    @@ Documentation/technical/parallel-checkout.txt (new)
     +Besides, long-running filters may use the delayed checkout feature to
     +postpone the return of some filtered blobs. The delayed checkout queue
     +and the parallel checkout queue are not compatible and should remain
    -+separated.
    ++separate.
     ++
     +Note: regular files that only require internal filters, like end-of-line
     +conversion and re-encoding, are eligible for parallel checkout.
    @@ Documentation/technical/parallel-checkout.txt (new)
     +The API
     +-------
     +
    -+The parallel checkout API was designed with the goal to minimize changes
    -+to the current users of the checkout machinery. This means that they
    -+don't have to call a different function for sequential or parallel
    ++The parallel checkout API was designed with the goal of minimizing
    ++changes to the current users of the checkout machinery. This means that
    ++they don't have to call a different function for sequential or parallel
     +checkout. As already mentioned, `checkout_entry()` will automatically
     +insert the given entry in the parallel checkout queue when this feature
     +is enabled and the entry is eligible; otherwise, it will just write the
-- 
2.30.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
@ 2021-04-19 19:53       ` Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19 19:53 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.

Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:

- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
  detects it by looking for EEXIST and EISDIR errors after an
  open(O_CREAT | O_EXCL) failure.

- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
  at the has_dirs_only_path() check, which is done for the leading path
  of each item in the parallel checkout queue.

Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).

After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Makefile            |   1 +
 entry.c             |  17 ++-
 parallel-checkout.c | 365 ++++++++++++++++++++++++++++++++++++++++++++
 parallel-checkout.h |  32 ++++
 unpack-trees.c      |   6 +-
 5 files changed, 418 insertions(+), 3 deletions(-)
 create mode 100644 parallel-checkout.c
 create mode 100644 parallel-checkout.h

diff --git a/Makefile b/Makefile
index a6a73c5741..99734cc2ea 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pager.o
+LIB_OBJS += parallel-checkout.o
 LIB_OBJS += parse-options-cb.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += patch-delta.o
diff --git a/entry.c b/entry.c
index 2dc94ba5cc..d7ed38aa40 100644
--- a/entry.c
+++ b/entry.c
@@ -7,6 +7,7 @@
 #include "progress.h"
 #include "fsmonitor.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-		if (dup == ce)
-			break;
+		if (dup == ce) {
+			/*
+			 * Parallel checkout doesn't create the files in index
+			 * order. So the other side of the collision may appear
+			 * after the given cache_entry in the array.
+			 */
+			if (parallel_checkout_status() == PC_RUNNING)
+				continue;
+			else
+				break;
+		}
 
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
@@ -536,6 +546,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
 		ca = &ca_buf;
 	}
 
+	if (!enqueue_checkout(ce, ca))
+		return 0;
+
 	return write_entry(ce, path.buf, ca, state, 0);
 }
 
diff --git a/parallel-checkout.c b/parallel-checkout.c
new file mode 100644
index 0000000000..590e2a3046
--- /dev/null
+++ b/parallel-checkout.c
@@ -0,0 +1,365 @@
+#include "cache.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "streaming.h"
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/* pointer to a istate->cache[] entry. Not owned by us. */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	struct stat st;
+	enum pc_item_status status;
+};
+
+struct parallel_checkout {
+	enum pc_status status;
+	struct parallel_checkout_item *items; /* The parallel checkout queue. */
+	size_t nr, alloc;
+};
+
+static struct parallel_checkout parallel_checkout;
+
+enum pc_status parallel_checkout_status(void)
+{
+	return parallel_checkout.status;
+}
+
+void init_parallel_checkout(void)
+{
+	if (parallel_checkout.status != PC_UNINITIALIZED)
+		BUG("parallel checkout already initialized");
+
+	parallel_checkout.status = PC_ACCEPTING_ENTRIES;
+}
+
+static void finish_parallel_checkout(void)
+{
+	if (parallel_checkout.status == PC_UNINITIALIZED)
+		BUG("cannot finish parallel checkout: not initialized yet");
+
+	free(parallel_checkout.items);
+	memset(&parallel_checkout, 0, sizeof(parallel_checkout));
+}
+
+static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
+					     const struct conv_attrs *ca)
+{
+	enum conv_attrs_classification c;
+
+	/*
+	 * Symlinks cannot be checked out in parallel as, in case of path
+	 * collision, they could racily replace leading directories of other
+	 * entries being checked out. Submodules are checked out in child
+	 * processes, which have their own parallel checkout queues.
+	 */
+	if (!S_ISREG(ce->ce_mode))
+		return 0;
+
+	c = classify_conv_attrs(ca);
+	switch (c) {
+	case CA_CLASS_INCORE:
+		return 1;
+
+	case CA_CLASS_INCORE_FILTER:
+		/*
+		 * It would be safe to allow concurrent instances of
+		 * single-file smudge filters, like rot13, but we should not
+		 * assume that all filters are parallel-process safe. So we
+		 * don't allow this.
+		 */
+		return 0;
+
+	case CA_CLASS_INCORE_PROCESS:
+		/*
+		 * The parallel queue and the delayed queue are not compatible,
+		 * so they must be kept completely separated. And we can't tell
+		 * if a long-running process will delay its response without
+		 * actually asking it to perform the filtering. Therefore, this
+		 * type of filter is not allowed in parallel checkout.
+		 *
+		 * Furthermore, there should only be one instance of the
+		 * long-running process filter as we don't know how it is
+		 * managing its own concurrency. So, spreading the entries that
+		 * requisite such a filter among the parallel workers would
+		 * require a lot more inter-process communication. We would
+		 * probably have to designate a single process to interact with
+		 * the filter and send all the necessary data to it, for each
+		 * entry.
+		 */
+		return 0;
+
+	case CA_CLASS_STREAMABLE:
+		return 1;
+
+	default:
+		BUG("unsupported conv_attrs classification '%d'", c);
+	}
+}
+
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
+{
+	struct parallel_checkout_item *pc_item;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES ||
+	    !is_eligible_for_parallel_checkout(ce, ca))
+		return -1;
+
+	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
+		   parallel_checkout.alloc);
+
+	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item->ce = ce;
+	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
+	pc_item->status = PC_ITEM_PENDING;
+
+	return 0;
+}
+
+static int handle_results(struct checkout *state)
+{
+	int ret = 0;
+	size_t i;
+	int have_pending = 0;
+
+	/*
+	 * We first update the successfully written entries with the collected
+	 * stat() data, so that they can be found by mark_colliding_entries(),
+	 * in the next loop, when necessary.
+	 */
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		if (pc_item->status == PC_ITEM_WRITTEN)
+			update_ce_after_write(state, pc_item->ce, &pc_item->st);
+	}
+
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+
+		switch(pc_item->status) {
+		case PC_ITEM_WRITTEN:
+			/* Already handled */
+			break;
+		case PC_ITEM_COLLIDED:
+			/*
+			 * The entry could not be checked out due to a path
+			 * collision with another entry. Since there can only
+			 * be one entry of each colliding group on the disk, we
+			 * could skip trying to check out this one and move on.
+			 * However, this would leave the unwritten entries with
+			 * null stat() fields on the index, which could
+			 * potentially slow down subsequent operations that
+			 * require refreshing it: git would not be able to
+			 * trust st_size and would have to go to the filesystem
+			 * to see if the contents match (see ie_modified()).
+			 *
+			 * Instead, let's pay the overhead only once, now, and
+			 * call checkout_entry_ca() again for this file, to
+			 * have its stat() data stored in the index. This also
+			 * has the benefit of adding this entry and its
+			 * colliding pair to the collision report message.
+			 * Additionally, this overwriting behavior is consistent
+			 * with what the sequential checkout does, so it doesn't
+			 * add any extra overhead.
+			 */
+			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
+						 state, NULL, NULL);
+			break;
+		case PC_ITEM_PENDING:
+			have_pending = 1;
+			/* fall through */
+		case PC_ITEM_FAILED:
+			ret = -1;
+			break;
+		default:
+			BUG("unknown checkout item status in parallel checkout");
+		}
+	}
+
+	if (have_pending)
+		error("parallel checkout finished with pending entries");
+
+	return ret;
+}
+
+static int reset_fd(int fd, const char *path)
+{
+	if (lseek(fd, 0, SEEK_SET) != 0)
+		return error_errno("failed to rewind descriptor of '%s'", path);
+	if (ftruncate(fd, 0))
+		return error_errno("failed to truncate file '%s'", path);
+	return 0;
+}
+
+static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
+			       const char *path)
+{
+	int ret;
+	struct stream_filter *filter;
+	struct strbuf buf = STRBUF_INIT;
+	char *blob;
+	unsigned long size;
+	ssize_t wrote;
+
+	/* Sanity check */
+	assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
+
+	filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
+	if (filter) {
+		if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
+			/* On error, reset fd to try writing without streaming */
+			if (reset_fd(fd, path))
+				return -1;
+		} else {
+			return 0;
+		}
+	}
+
+	blob = read_blob_entry(pc_item->ce, &size);
+	if (!blob)
+		return error("cannot read object %s '%s'",
+			     oid_to_hex(&pc_item->ce->oid), pc_item->ce->name);
+
+	/*
+	 * checkout metadata is used to give context for external process
+	 * filters. Files requiring such filters are not eligible for parallel
+	 * checkout, so pass NULL.
+	 */
+	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
+					 blob, size, &buf, NULL);
+
+	if (ret) {
+		size_t newsize;
+		free(blob);
+		blob = strbuf_detach(&buf, &newsize);
+		size = newsize;
+	}
+
+	wrote = write_in_full(fd, blob, size);
+	free(blob);
+	if (wrote < 0)
+		return error("unable to write file '%s'", path);
+
+	return 0;
+}
+
+static int close_and_clear(int *fd)
+{
+	int ret = 0;
+
+	if (*fd >= 0) {
+		ret = close(*fd);
+		*fd = -1;
+	}
+
+	return ret;
+}
+
+static void write_pc_item(struct parallel_checkout_item *pc_item,
+			  struct checkout *state)
+{
+	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
+	int fd = -1, fstat_done = 0;
+	struct strbuf path = STRBUF_INIT;
+	const char *dir_sep;
+
+	strbuf_add(&path, state->base_dir, state->base_dir_len);
+	strbuf_add(&path, pc_item->ce->name, pc_item->ce->ce_namelen);
+
+	dir_sep = find_last_dir_sep(path.buf);
+
+	/*
+	 * The leading dirs should have been already created by now. But, in
+	 * case of path collisions, one of the dirs could have been replaced by
+	 * a symlink (checked out after we enqueued this entry for parallel
+	 * checkout). Thus, we must check the leading dirs again.
+	 */
+	if (dir_sep && !has_dirs_only_path(path.buf, dir_sep - path.buf,
+					   state->base_dir_len)) {
+		pc_item->status = PC_ITEM_COLLIDED;
+		goto out;
+	}
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode);
+
+	if (fd < 0) {
+		if (errno == EEXIST || errno == EISDIR) {
+			/*
+			 * Errors which probably represent a path collision.
+			 * Suppress the error message and mark the item to be
+			 * retried later, sequentially. ENOTDIR and ENOENT are
+			 * also interesting, but the above has_dirs_only_path()
+			 * call should have already caught these cases.
+			 */
+			pc_item->status = PC_ITEM_COLLIDED;
+		} else {
+			error_errno("failed to open file '%s'", path.buf);
+			pc_item->status = PC_ITEM_FAILED;
+		}
+		goto out;
+	}
+
+	if (write_pc_item_to_fd(pc_item, fd, path.buf)) {
+		/* Error was already reported. */
+		pc_item->status = PC_ITEM_FAILED;
+		close_and_clear(&fd);
+		unlink(path.buf);
+		goto out;
+	}
+
+	fstat_done = fstat_checkout_output(fd, state, &pc_item->st);
+
+	if (close_and_clear(&fd)) {
+		error_errno("unable to close file '%s'", path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	if (state->refresh_cache && !fstat_done && lstat(path.buf, &pc_item->st) < 0) {
+		error_errno("unable to stat just-written file '%s'",  path.buf);
+		pc_item->status = PC_ITEM_FAILED;
+		goto out;
+	}
+
+	pc_item->status = PC_ITEM_WRITTEN;
+
+out:
+	strbuf_release(&path);
+}
+
+static void write_items_sequentially(struct checkout *state)
+{
+	size_t i;
+
+	for (i = 0; i < parallel_checkout.nr; i++)
+		write_pc_item(&parallel_checkout.items[i], state);
+}
+
+int run_parallel_checkout(struct checkout *state)
+{
+	int ret;
+
+	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
+		BUG("cannot run parallel checkout: uninitialized or already running");
+
+	parallel_checkout.status = PC_RUNNING;
+
+	write_items_sequentially(state);
+	ret = handle_results(state);
+
+	finish_parallel_checkout();
+	return ret;
+}
diff --git a/parallel-checkout.h b/parallel-checkout.h
new file mode 100644
index 0000000000..4ad2a519b3
--- /dev/null
+++ b/parallel-checkout.h
@@ -0,0 +1,32 @@
+#ifndef PARALLEL_CHECKOUT_H
+#define PARALLEL_CHECKOUT_H
+
+struct cache_entry;
+struct checkout;
+struct conv_attrs;
+
+enum pc_status {
+	PC_UNINITIALIZED = 0,
+	PC_ACCEPTING_ENTRIES,
+	PC_RUNNING,
+};
+
+enum pc_status parallel_checkout_status(void);
+
+/*
+ * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
+ * only when in the PC_UNINITIALIZED state.
+ */
+void init_parallel_checkout(void);
+
+/*
+ * Return -1 if parallel checkout is currently not accepting entries or if the
+ * entry is not eligible for parallel checkout. Otherwise, enqueue the entry
+ * for later write and return 0.
+ */
+int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+
+/* Write all the queued entries, returning 0 on success.*/
+int run_parallel_checkout(struct checkout *state);
+
+#endif /* PARALLEL_CHECKOUT_H */
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a1afbc1e4..f0430d458d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "entry.h"
+#include "parallel-checkout.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -441,7 +442,6 @@ static int check_updates(struct unpack_trees_options *o,
 	if (should_update_submodules())
 		load_gitmodules_file(index, &state);
 
-	enable_delayed_checkout(&state);
 	if (has_promisor_remote()) {
 		/*
 		 * Prefetch the objects that are to be checked out in the loop
@@ -464,6 +464,9 @@ static int check_updates(struct unpack_trees_options *o,
 					   to_fetch.oid, to_fetch.nr);
 		oid_array_clear(&to_fetch);
 	}
+
+	enable_delayed_checkout(&state);
+	init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -477,6 +480,7 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
+	errs |= run_parallel_checkout(&state);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 2/5] parallel-checkout: make it truly parallel
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
@ 2021-04-19 19:53       ` Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19 19:53 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Use multiple worker processes to distribute the queued entries and call
write_pc_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could affect
performance due to lock contention in the kernel. Work stealing (or any
other format of re-distribution) is not implemented yet.

The protocol between the main process and the workers is quite simple.
They exchange binary messages packed in pkt-line format, and use
PKT-FLUSH to mark the end of input (from both sides). The main process
starts the communication by sending N pkt-lines, each corresponding to
an item that needs to be written. These packets contain all the
necessary information to load, smudge, and write the blob associated
with each item. Then it waits for the worker to send back N pkt-lines
containing the results for each item. The resulting packet must contain:
the identification number of the item that it refers to, the status of
the operation, and the lstat() data gathered after writing the file (iff
the operation was successful).

For now, checkout always uses a hardcoded value of 2 workers, only to
demonstrate that the parallel checkout framework correctly divides and
writes the queued entries. The next patch will add user configurations
and define a more reasonable default, based on tests with the said
settings.

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 .gitignore                 |   1 +
 Makefile                   |   1 +
 builtin.h                  |   1 +
 builtin/checkout--worker.c | 145 ++++++++++++++++++
 git.c                      |   2 +
 parallel-checkout.c        | 300 +++++++++++++++++++++++++++++++++----
 parallel-checkout.h        |  73 ++++++++-
 7 files changed, 496 insertions(+), 27 deletions(-)
 create mode 100644 builtin/checkout--worker.c

diff --git a/.gitignore b/.gitignore
index 3dcdb6bb5a..96c794b1c7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@
 /git-check-mailmap
 /git-check-ref-format
 /git-checkout
+/git-checkout--worker
 /git-checkout-index
 /git-cherry
 /git-cherry-pick
diff --git a/Makefile b/Makefile
index 99734cc2ea..84e15f58a0 100644
--- a/Makefile
+++ b/Makefile
@@ -1062,6 +1062,7 @@ BUILTIN_OBJS += builtin/check-attr.o
 BUILTIN_OBJS += builtin/check-ignore.o
 BUILTIN_OBJS += builtin/check-mailmap.o
 BUILTIN_OBJS += builtin/check-ref-format.o
+BUILTIN_OBJS += builtin/checkout--worker.o
 BUILTIN_OBJS += builtin/checkout-index.o
 BUILTIN_OBJS += builtin/checkout.o
 BUILTIN_OBJS += builtin/clean.o
diff --git a/builtin.h b/builtin.h
index b6ce981b73..16ecd5586f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -123,6 +123,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix);
 int cmd_bundle(int argc, const char **argv, const char *prefix);
 int cmd_cat_file(int argc, const char **argv, const char *prefix);
 int cmd_checkout(int argc, const char **argv, const char *prefix);
+int cmd_checkout__worker(int argc, const char **argv, const char *prefix);
 int cmd_checkout_index(int argc, const char **argv, const char *prefix);
 int cmd_check_attr(int argc, const char **argv, const char *prefix);
 int cmd_check_ignore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout--worker.c b/builtin/checkout--worker.c
new file mode 100644
index 0000000000..31e0de2f7e
--- /dev/null
+++ b/builtin/checkout--worker.c
@@ -0,0 +1,145 @@
+#include "builtin.h"
+#include "config.h"
+#include "entry.h"
+#include "parallel-checkout.h"
+#include "parse-options.h"
+#include "pkt-line.h"
+
+static void packet_to_pc_item(const char *buffer, int len,
+			      struct parallel_checkout_item *pc_item)
+{
+	const struct pc_item_fixed_portion *fixed_portion;
+	const char *variant;
+	char *encoding;
+
+	if (len < sizeof(struct pc_item_fixed_portion))
+		BUG("checkout worker received too short item (got %dB, exp %dB)",
+		    len, (int)sizeof(struct pc_item_fixed_portion));
+
+	fixed_portion = (struct pc_item_fixed_portion *)buffer;
+
+	if (len - sizeof(struct pc_item_fixed_portion) !=
+		fixed_portion->name_len + fixed_portion->working_tree_encoding_len)
+		BUG("checkout worker received corrupted item");
+
+	variant = buffer + sizeof(struct pc_item_fixed_portion);
+
+	/*
+	 * Note: the main process uses zero length to communicate that the
+	 * encoding is NULL. There is no use case that requires sending an
+	 * actual empty string, since convert_attrs() never sets
+	 * ca.working_tree_enconding to "".
+	 */
+	if (fixed_portion->working_tree_encoding_len) {
+		encoding = xmemdupz(variant,
+				    fixed_portion->working_tree_encoding_len);
+		variant += fixed_portion->working_tree_encoding_len;
+	} else {
+		encoding = NULL;
+	}
+
+	memset(pc_item, 0, sizeof(*pc_item));
+	pc_item->ce = make_empty_transient_cache_entry(fixed_portion->name_len);
+	pc_item->ce->ce_namelen = fixed_portion->name_len;
+	pc_item->ce->ce_mode = fixed_portion->ce_mode;
+	memcpy(pc_item->ce->name, variant, pc_item->ce->ce_namelen);
+	oidcpy(&pc_item->ce->oid, &fixed_portion->oid);
+
+	pc_item->id = fixed_portion->id;
+	pc_item->ca.crlf_action = fixed_portion->crlf_action;
+	pc_item->ca.ident = fixed_portion->ident;
+	pc_item->ca.working_tree_encoding = encoding;
+}
+
+static void report_result(struct parallel_checkout_item *pc_item)
+{
+	struct pc_item_result res;
+	size_t size;
+
+	res.id = pc_item->id;
+	res.status = pc_item->status;
+
+	if (pc_item->status == PC_ITEM_WRITTEN) {
+		res.st = pc_item->st;
+		size = sizeof(res);
+	} else {
+		size = PC_ITEM_RESULT_BASE_SIZE;
+	}
+
+	packet_write(1, (const char *)&res, size);
+}
+
+/* Free the worker-side malloced data, but not pc_item itself. */
+static void release_pc_item_data(struct parallel_checkout_item *pc_item)
+{
+	free((char *)pc_item->ca.working_tree_encoding);
+	discard_cache_entry(pc_item->ce);
+}
+
+static void worker_loop(struct checkout *state)
+{
+	struct parallel_checkout_item *items = NULL;
+	size_t i, nr = 0, alloc = 0;
+
+	while (1) {
+		int len = packet_read(0, NULL, NULL, packet_buffer,
+				      sizeof(packet_buffer), 0);
+
+		if (len < 0)
+			BUG("packet_read() returned negative value");
+		else if (!len)
+			break;
+
+		ALLOC_GROW(items, nr + 1, alloc);
+		packet_to_pc_item(packet_buffer, len, &items[nr++]);
+	}
+
+	for (i = 0; i < nr; i++) {
+		struct parallel_checkout_item *pc_item = &items[i];
+		write_pc_item(pc_item, state);
+		report_result(pc_item);
+		release_pc_item_data(pc_item);
+	}
+
+	packet_flush(1);
+
+	free(items);
+}
+
+static const char * const checkout_worker_usage[] = {
+	N_("git checkout--worker [<options>]"),
+	NULL
+};
+
+int cmd_checkout__worker(int argc, const char **argv, const char *prefix)
+{
+	struct checkout state = CHECKOUT_INIT;
+	struct option checkout_worker_options[] = {
+		OPT_STRING(0, "prefix", &state.base_dir, N_("string"),
+			N_("when creating files, prepend <string>")),
+		OPT_END()
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(checkout_worker_usage,
+				   checkout_worker_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix, checkout_worker_options,
+			     checkout_worker_usage, 0);
+	if (argc > 0)
+		usage_with_options(checkout_worker_usage, checkout_worker_options);
+
+	if (state.base_dir)
+		state.base_dir_len = strlen(state.base_dir);
+
+	/*
+	 * Setting this on a worker won't actually update the index. We just
+	 * need to tell the checkout machinery to lstat() the written entries,
+	 * so that we can send this data back to the main process.
+	 */
+	state.refresh_cache = 1;
+
+	worker_loop(&state);
+	return 0;
+}
diff --git a/git.c b/git.c
index 9bc077a025..532b5b4136 100644
--- a/git.c
+++ b/git.c
@@ -490,6 +490,8 @@ static struct cmd_struct commands[] = {
 	{ "check-mailmap", cmd_check_mailmap, RUN_SETUP },
 	{ "check-ref-format", cmd_check_ref_format, NO_PARSEOPT  },
 	{ "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+	{ "checkout--worker", cmd_checkout__worker,
+		RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX },
 	{ "checkout-index", cmd_checkout_index,
 		RUN_SETUP | NEED_WORK_TREE},
 	{ "cherry", cmd_cherry, RUN_SETUP },
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 590e2a3046..836154fec6 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,28 +1,14 @@
 #include "cache.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "pkt-line.h"
+#include "run-command.h"
+#include "sigchain.h"
 #include "streaming.h"
 
-enum pc_item_status {
-	PC_ITEM_PENDING = 0,
-	PC_ITEM_WRITTEN,
-	/*
-	 * The entry could not be written because there was another file
-	 * already present in its path or leading directories. Since
-	 * checkout_entry_ca() removes such files from the working tree before
-	 * enqueueing the entry for parallel checkout, it means that there was
-	 * a path collision among the entries being written.
-	 */
-	PC_ITEM_COLLIDED,
-	PC_ITEM_FAILED,
-};
-
-struct parallel_checkout_item {
-	/* pointer to a istate->cache[] entry. Not owned by us. */
-	struct cache_entry *ce;
-	struct conv_attrs ca;
-	struct stat st;
-	enum pc_item_status status;
+struct pc_worker {
+	struct child_process cp;
+	size_t next_item_to_complete, nr_items_to_complete;
 };
 
 struct parallel_checkout {
@@ -59,6 +45,7 @@ static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
 					     const struct conv_attrs *ca)
 {
 	enum conv_attrs_classification c;
+	size_t packed_item_size;
 
 	/*
 	 * Symlinks cannot be checked out in parallel as, in case of path
@@ -69,6 +56,18 @@ static int is_eligible_for_parallel_checkout(const struct cache_entry *ce,
 	if (!S_ISREG(ce->ce_mode))
 		return 0;
 
+	packed_item_size = sizeof(struct pc_item_fixed_portion) + ce->ce_namelen +
+		(ca->working_tree_encoding ? strlen(ca->working_tree_encoding) : 0);
+
+	/*
+	 * The amount of data we send to the workers per checkout item is
+	 * typically small (75~300B). So unless we find an insanely huge path
+	 * of 64KB, we should never reach the 65KB limit of one pkt-line. If
+	 * that does happen, we let the sequential code handle the item.
+	 */
+	if (packed_item_size > LARGE_PACKET_DATA_MAX)
+		return 0;
+
 	c = classify_conv_attrs(ca);
 	switch (c) {
 	case CA_CLASS_INCORE:
@@ -121,10 +120,12 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	ALLOC_GROW(parallel_checkout.items, parallel_checkout.nr + 1,
 		   parallel_checkout.alloc);
 
-	pc_item = &parallel_checkout.items[parallel_checkout.nr++];
+	pc_item = &parallel_checkout.items[parallel_checkout.nr];
 	pc_item->ce = ce;
 	memcpy(&pc_item->ca, ca, sizeof(pc_item->ca));
 	pc_item->status = PC_ITEM_PENDING;
+	pc_item->id = parallel_checkout.nr;
+	parallel_checkout.nr++;
 
 	return 0;
 }
@@ -236,7 +237,8 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
 	/*
 	 * checkout metadata is used to give context for external process
 	 * filters. Files requiring such filters are not eligible for parallel
-	 * checkout, so pass NULL.
+	 * checkout, so pass NULL. Note: if that changes, the metadata must also
+	 * be passed from the main process to the workers.
 	 */
 	ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
 					 blob, size, &buf, NULL);
@@ -268,8 +270,8 @@ static int close_and_clear(int *fd)
 	return ret;
 }
 
-static void write_pc_item(struct parallel_checkout_item *pc_item,
-			  struct checkout *state)
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state)
 {
 	unsigned int mode = (pc_item->ce->ce_mode & 0100) ? 0777 : 0666;
 	int fd = -1, fstat_done = 0;
@@ -340,6 +342,240 @@ static void write_pc_item(struct parallel_checkout_item *pc_item,
 	strbuf_release(&path);
 }
 
+static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
+{
+	size_t len_data;
+	char *data, *variant;
+	struct pc_item_fixed_portion *fixed_portion;
+	const char *working_tree_encoding = pc_item->ca.working_tree_encoding;
+	size_t name_len = pc_item->ce->ce_namelen;
+	size_t working_tree_encoding_len = working_tree_encoding ?
+					   strlen(working_tree_encoding) : 0;
+
+	/*
+	 * Any changes in the calculation of the message size must also be made
+	 * in is_eligible_for_parallel_checkout().
+	 */
+	len_data = sizeof(struct pc_item_fixed_portion) + name_len +
+		   working_tree_encoding_len;
+
+	data = xcalloc(1, len_data);
+
+	fixed_portion = (struct pc_item_fixed_portion *)data;
+	fixed_portion->id = pc_item->id;
+	fixed_portion->ce_mode = pc_item->ce->ce_mode;
+	fixed_portion->crlf_action = pc_item->ca.crlf_action;
+	fixed_portion->ident = pc_item->ca.ident;
+	fixed_portion->name_len = name_len;
+	fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
+	/*
+	 * We use hashcpy() instead of oidcpy() because the hash[] positions
+	 * after `the_hash_algo->rawsz` might not be initialized. And Valgrind
+	 * would complain about passing uninitialized bytes to a syscall
+	 * (write(2)). There is no real harm in this case, but the warning could
+	 * hinder the detection of actual errors.
+	 */
+	hashcpy(fixed_portion->oid.hash, pc_item->ce->oid.hash);
+
+	variant = data + sizeof(*fixed_portion);
+	if (working_tree_encoding_len) {
+		memcpy(variant, working_tree_encoding, working_tree_encoding_len);
+		variant += working_tree_encoding_len;
+	}
+	memcpy(variant, pc_item->ce->name, name_len);
+
+	packet_write(fd, data, len_data);
+
+	free(data);
+}
+
+static void send_batch(int fd, size_t start, size_t nr)
+{
+	size_t i;
+	sigchain_push(SIGPIPE, SIG_IGN);
+	for (i = 0; i < nr; i++)
+		send_one_item(fd, &parallel_checkout.items[start + i]);
+	packet_flush(fd);
+	sigchain_pop(SIGPIPE);
+}
+
+static struct pc_worker *setup_workers(struct checkout *state, int num_workers)
+{
+	struct pc_worker *workers;
+	int i, workers_with_one_extra_item;
+	size_t base_batch_size, batch_beginning = 0;
+
+	ALLOC_ARRAY(workers, num_workers);
+
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+
+		child_process_init(cp);
+		cp->git_cmd = 1;
+		cp->in = -1;
+		cp->out = -1;
+		cp->clean_on_exit = 1;
+		strvec_push(&cp->args, "checkout--worker");
+		if (state->base_dir_len)
+			strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
+		if (start_command(cp))
+			die("failed to spawn checkout worker");
+	}
+
+	base_batch_size = parallel_checkout.nr / num_workers;
+	workers_with_one_extra_item = parallel_checkout.nr % num_workers;
+
+	for (i = 0; i < num_workers; i++) {
+		struct pc_worker *worker = &workers[i];
+		size_t batch_size = base_batch_size;
+
+		/* distribute the extra work evenly */
+		if (i < workers_with_one_extra_item)
+			batch_size++;
+
+		send_batch(worker->cp.in, batch_beginning, batch_size);
+		worker->next_item_to_complete = batch_beginning;
+		worker->nr_items_to_complete = batch_size;
+
+		batch_beginning += batch_size;
+	}
+
+	return workers;
+}
+
+static void finish_workers(struct pc_worker *workers, int num_workers)
+{
+	int i;
+
+	/*
+	 * Close pipes before calling finish_command() to let the workers
+	 * exit asynchronously and avoid spending extra time on wait().
+	 */
+	for (i = 0; i < num_workers; i++) {
+		struct child_process *cp = &workers[i].cp;
+		if (cp->in >= 0)
+			close(cp->in);
+		if (cp->out >= 0)
+			close(cp->out);
+	}
+
+	for (i = 0; i < num_workers; i++) {
+		int rc = finish_command(&workers[i].cp);
+		if (rc > 128) {
+			/*
+			 * For a normal non-zero exit, the worker should have
+			 * already printed something useful to stderr. But a
+			 * death by signal should be mentioned to the user.
+			 */
+			error("checkout worker %d died of signal %d", i, rc - 128);
+		}
+	}
+
+	free(workers);
+}
+
+static inline void assert_pc_item_result_size(int got, int exp)
+{
+	if (got != exp)
+		BUG("wrong result size from checkout worker (got %dB, exp %dB)",
+		    got, exp);
+}
+
+static void parse_and_save_result(const char *buffer, int len,
+				  struct pc_worker *worker)
+{
+	struct pc_item_result *res;
+	struct parallel_checkout_item *pc_item;
+	struct stat *st = NULL;
+
+	if (len < PC_ITEM_RESULT_BASE_SIZE)
+		BUG("too short result from checkout worker (got %dB, exp >=%dB)",
+		    len, (int)PC_ITEM_RESULT_BASE_SIZE);
+
+	res = (struct pc_item_result *)buffer;
+
+	/*
+	 * Worker should send either the full result struct on success, or
+	 * just the base (i.e. no stat data), otherwise.
+	 */
+	if (res->status == PC_ITEM_WRITTEN) {
+		assert_pc_item_result_size(len, (int)sizeof(struct pc_item_result));
+		st = &res->st;
+	} else {
+		assert_pc_item_result_size(len, (int)PC_ITEM_RESULT_BASE_SIZE);
+	}
+
+	if (!worker->nr_items_to_complete)
+		BUG("received result from supposedly finished checkout worker");
+	if (res->id != worker->next_item_to_complete)
+		BUG("unexpected item id from checkout worker (got %"PRIuMAX", exp %"PRIuMAX")",
+		    (uintmax_t)res->id, (uintmax_t)worker->next_item_to_complete);
+
+	worker->next_item_to_complete++;
+	worker->nr_items_to_complete--;
+
+	pc_item = &parallel_checkout.items[res->id];
+	pc_item->status = res->status;
+	if (st)
+		pc_item->st = *st;
+}
+
+static void gather_results_from_workers(struct pc_worker *workers,
+					int num_workers)
+{
+	int i, active_workers = num_workers;
+	struct pollfd *pfds;
+
+	CALLOC_ARRAY(pfds, num_workers);
+	for (i = 0; i < num_workers; i++) {
+		pfds[i].fd = workers[i].cp.out;
+		pfds[i].events = POLLIN;
+	}
+
+	while (active_workers) {
+		int nr = poll(pfds, num_workers, -1);
+
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("failed to poll checkout workers");
+		}
+
+		for (i = 0; i < num_workers && nr > 0; i++) {
+			struct pc_worker *worker = &workers[i];
+			struct pollfd *pfd = &pfds[i];
+
+			if (!pfd->revents)
+				continue;
+
+			if (pfd->revents & POLLIN) {
+				int len = packet_read(pfd->fd, NULL, NULL,
+						      packet_buffer,
+						      sizeof(packet_buffer), 0);
+
+				if (len < 0) {
+					BUG("packet_read() returned negative value");
+				} else if (!len) {
+					pfd->fd = -1;
+					active_workers--;
+				} else {
+					parse_and_save_result(packet_buffer,
+							      len, worker);
+				}
+			} else if (pfd->revents & POLLHUP) {
+				pfd->fd = -1;
+				active_workers--;
+			} else if (pfd->revents & (POLLNVAL | POLLERR)) {
+				die("error polling from checkout worker");
+			}
+
+			nr--;
+		}
+	}
+
+	free(pfds);
+}
+
 static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
@@ -348,16 +584,28 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
+static const int DEFAULT_NUM_WORKERS = 2;
+
 int run_parallel_checkout(struct checkout *state)
 {
-	int ret;
+	int ret, num_workers = DEFAULT_NUM_WORKERS;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
 
-	write_items_sequentially(state);
+	if (parallel_checkout.nr < num_workers)
+		num_workers = parallel_checkout.nr;
+
+	if (num_workers <= 1) {
+		write_items_sequentially(state);
+	} else {
+		struct pc_worker *workers = setup_workers(state, num_workers);
+		gather_results_from_workers(workers, num_workers);
+		finish_workers(workers, num_workers);
+	}
+
 	ret = handle_results(state);
 
 	finish_parallel_checkout();
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 4ad2a519b3..ec58716519 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -1,9 +1,14 @@
 #ifndef PARALLEL_CHECKOUT_H
 #define PARALLEL_CHECKOUT_H
 
+#include "convert.h"
+
 struct cache_entry;
 struct checkout;
-struct conv_attrs;
+
+/****************************************************************
+ * Users of parallel checkout
+ ****************************************************************/
 
 enum pc_status {
 	PC_UNINITIALIZED = 0,
@@ -29,4 +34,70 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 /* Write all the queued entries, returning 0 on success.*/
 int run_parallel_checkout(struct checkout *state);
 
+/****************************************************************
+ * Interface with checkout--worker
+ ****************************************************************/
+
+enum pc_item_status {
+	PC_ITEM_PENDING = 0,
+	PC_ITEM_WRITTEN,
+	/*
+	 * The entry could not be written because there was another file
+	 * already present in its path or leading directories. Since
+	 * checkout_entry_ca() removes such files from the working tree before
+	 * enqueueing the entry for parallel checkout, it means that there was
+	 * a path collision among the entries being written.
+	 */
+	PC_ITEM_COLLIDED,
+	PC_ITEM_FAILED,
+};
+
+struct parallel_checkout_item {
+	/*
+	 * In main process ce points to a istate->cache[] entry. Thus, it's not
+	 * owned by us. In workers they own the memory, which *must be* released.
+	 */
+	struct cache_entry *ce;
+	struct conv_attrs ca;
+	size_t id; /* position in parallel_checkout.items[] of main process */
+
+	/* Output fields, sent from workers. */
+	enum pc_item_status status;
+	struct stat st;
+};
+
+/*
+ * The fixed-size portion of `struct parallel_checkout_item` that is sent to the
+ * workers. Following this will be 2 strings: ca.working_tree_encoding and
+ * ce.name; These are NOT null terminated, since we have the size in the fixed
+ * portion.
+ *
+ * Note that not all fields of conv_attrs and cache_entry are passed, only the
+ * ones that will be required by the workers to smudge and write the entry.
+ */
+struct pc_item_fixed_portion {
+	size_t id;
+	struct object_id oid;
+	unsigned int ce_mode;
+	enum convert_crlf_action crlf_action;
+	int ident;
+	size_t working_tree_encoding_len;
+	size_t name_len;
+};
+
+/*
+ * The fields of `struct parallel_checkout_item` that are returned by the
+ * workers. Note: `st` must be the last one, as it is omitted on error.
+ */
+struct pc_item_result {
+	size_t id;
+	enum pc_item_status status;
+	struct stat st;
+};
+
+#define PC_ITEM_RESULT_BASE_SIZE offsetof(struct pc_item_result, st)
+
+void write_pc_item(struct parallel_checkout_item *pc_item,
+		   struct checkout *state);
+
 #endif /* PARALLEL_CHECKOUT_H */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 3/5] parallel-checkout: add configuration options
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
@ 2021-04-19 19:53       ` Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares
  4 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19 19:53 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.

To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.

Local SSD:
             Sequential             10 workers            Speedup
Clone        8.805 s ± 0.043 s      3.564 s ± 0.041 s     2.47 ± 0.03
Checkout I   9.678 s ± 0.057 s      4.486 s ± 0.050 s     2.16 ± 0.03
Checkout II  5.034 s ± 0.072 s      3.021 s ± 0.038 s     1.67 ± 0.03

Local HDD:
             Sequential             10 workers             Speedup
Clone        32.288 s ± 0.580 s     30.724 s ± 0.522 s    1.05 ± 0.03
Checkout I   54.172 s ±  7.119 s    54.429 s ± 6.738 s    1.00 ± 0.18
Checkout II  40.465 s ± 2.402 s     38.682 s ± 1.365 s    1.05 ± 0.07

Linux NFS server (v4.1, on EBS, single availability zone):

             Sequential             32 workers            Speedup
Clone        240.368 s ± 6.347 s    57.349 s ± 0.870 s    4.19 ± 0.13
Checkout I   242.862 s ± 2.215 s    58.700 s ± 0.904 s    4.14 ± 0.07
Checkout II  65.751 s ± 1.577 s     23.820 s ± 0.407 s    2.76 ± 0.08

EFS (v4.1, replicated over multiple availability zones):

             Sequential             32 workers            Speedup
Clone        922.321 s ± 2.274 s    210.453 s ± 3.412 s   4.38 ± 0.07
Checkout I   1011.300 s ± 7.346 s   297.828 s ± 0.964 s   3.40 ± 0.03
Checkout II  294.104 s ± 1.836 s    126.017 s ± 1.190 s   2.33 ± 0.03

The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.

To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:

             sequential fallback   10 workers           speedup
10   files    772.3 ms ± 12.6 ms   769.0 ms ± 13.6 ms   1.00 ± 0.02
20   files    780.5 ms ± 15.8 ms   775.2 ms ±  9.2 ms   1.01 ± 0.02
50   files    806.2 ms ± 13.8 ms   767.4 ms ±  8.5 ms   1.05 ± 0.02
100  files    833.7 ms ± 21.4 ms   750.5 ms ± 16.8 ms   1.11 ± 0.04
200  files    897.6 ms ± 30.9 ms   730.5 ms ± 14.7 ms   1.23 ± 0.05
500  files   1035.4 ms ± 48.0 ms   677.1 ms ± 22.3 ms   1.53 ± 0.09
1000 files   1244.6 ms ± 35.6 ms   654.0 ms ± 38.3 ms   1.90 ± 0.12
2000 files   1488.8 ms ± 53.4 ms   658.8 ms ± 23.8 ms   2.26 ± 0.12

From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.

Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.

About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/checkout.txt | 21 +++++++++++++++++++++
 parallel-checkout.c               | 24 +++++++++++++++++++-----
 parallel-checkout.h               |  9 +++++++--
 unpack-trees.c                    | 10 +++++++---
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt
index 2cddf7b4b4..bfbca90f0e 100644
--- a/Documentation/config/checkout.txt
+++ b/Documentation/config/checkout.txt
@@ -21,3 +21,24 @@ checkout.guess::
 	Provides the default value for the `--guess` or `--no-guess`
 	option in `git checkout` and `git switch`. See
 	linkgit:git-switch[1] and linkgit:git-checkout[1].
+
+checkout.workers::
+	The number of parallel workers to use when updating the working tree.
+	The default is one, i.e. sequential execution. If set to a value less
+	than one, Git will use as many workers as the number of logical cores
+	available. This setting and `checkout.thresholdForParallelism` affect
+	all commands that perform checkout. E.g. checkout, clone, reset,
+	sparse-checkout, etc.
++
+Note: parallel checkout usually delivers better performance for repositories
+located on SSDs or over NFS. For repositories on spinning disks and/or machines
+with a small number of cores, the default sequential checkout often performs
+better. The size and compression level of a repository might also influence how
+well the parallel version performs.
+
+checkout.thresholdForParallelism::
+	When running parallel checkout with a small number of files, the cost
+	of subprocess spawning and inter-process communication might outweigh
+	the parallelization gains. This setting allows to define the minimum
+	number of files for which parallel checkout should be attempted. The
+	default is 100.
diff --git a/parallel-checkout.c b/parallel-checkout.c
index 836154fec6..3cc2028861 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -1,10 +1,12 @@
 #include "cache.h"
+#include "config.h"
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "streaming.h"
+#include "thread-utils.h"
 
 struct pc_worker {
 	struct child_process cp;
@@ -24,6 +26,20 @@ enum pc_status parallel_checkout_status(void)
 	return parallel_checkout.status;
 }
 
+static const int DEFAULT_THRESHOLD_FOR_PARALLELISM = 100;
+static const int DEFAULT_NUM_WORKERS = 1;
+
+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+	if (git_config_get_int("checkout.workers", num_workers))
+		*num_workers = DEFAULT_NUM_WORKERS;
+	else if (*num_workers < 1)
+		*num_workers = online_cpus();
+
+	if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+		*threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
 void init_parallel_checkout(void)
 {
 	if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -584,11 +600,9 @@ static void write_items_sequentially(struct checkout *state)
 		write_pc_item(&parallel_checkout.items[i], state);
 }
 
-static const int DEFAULT_NUM_WORKERS = 2;
-
-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
 {
-	int ret, num_workers = DEFAULT_NUM_WORKERS;
+	int ret;
 
 	if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
 		BUG("cannot run parallel checkout: uninitialized or already running");
@@ -598,7 +612,7 @@ int run_parallel_checkout(struct checkout *state)
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
 
-	if (num_workers <= 1) {
+	if (num_workers <= 1 || parallel_checkout.nr < threshold) {
 		write_items_sequentially(state);
 	} else {
 		struct pc_worker *workers = setup_workers(state, num_workers);
diff --git a/parallel-checkout.h b/parallel-checkout.h
index ec58716519..2a68ab954d 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -17,6 +17,7 @@ enum pc_status {
 };
 
 enum pc_status parallel_checkout_status(void);
+void get_parallel_checkout_configs(int *num_workers, int *threshold);
 
 /*
  * Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
@@ -31,8 +32,12 @@ void init_parallel_checkout(void);
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
 
-/* Write all the queued entries, returning 0 on success.*/
-int run_parallel_checkout(struct checkout *state);
+/*
+ * Write all the queued entries, returning 0 on success. If the number of
+ * entries is smaller than the specified threshold, the operation is performed
+ * sequentially.
+ */
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
 
 /****************************************************************
  * Interface with checkout--worker
diff --git a/unpack-trees.c b/unpack-trees.c
index f0430d458d..0669748f21 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
 	int errs = 0;
 	struct progress *progress;
 	struct checkout state = CHECKOUT_INIT;
-	int i;
+	int i, pc_workers, pc_threshold;
 
 	trace_performance_enter();
 	state.force = 1;
@@ -465,8 +465,11 @@ static int check_updates(struct unpack_trees_options *o,
 		oid_array_clear(&to_fetch);
 	}
 
+	get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
 	enable_delayed_checkout(&state);
-	init_parallel_checkout();
+	if (pc_workers > 1)
+		init_parallel_checkout();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
 		}
 	}
 	stop_progress(&progress);
-	errs |= run_parallel_checkout(&state);
+	if (pc_workers > 1)
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 4/5] parallel-checkout: support progress displaying
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
                         ` (2 preceding siblings ...)
  2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
@ 2021-04-19 19:53       ` Matheus Tavares
  2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares
  4 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19 19:53 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 parallel-checkout.c | 34 +++++++++++++++++++++++++++++++---
 parallel-checkout.h |  5 ++++-
 unpack-trees.c      | 11 ++++++++---
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/parallel-checkout.c b/parallel-checkout.c
index 3cc2028861..09e8b10a35 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -3,6 +3,7 @@
 #include "entry.h"
 #include "parallel-checkout.h"
 #include "pkt-line.h"
+#include "progress.h"
 #include "run-command.h"
 #include "sigchain.h"
 #include "streaming.h"
@@ -17,6 +18,8 @@ struct parallel_checkout {
 	enum pc_status status;
 	struct parallel_checkout_item *items; /* The parallel checkout queue. */
 	size_t nr, alloc;
+	struct progress *progress;
+	unsigned int *progress_cnt;
 };
 
 static struct parallel_checkout parallel_checkout;
@@ -146,6 +149,20 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca)
 	return 0;
 }
 
+size_t pc_queue_size(void)
+{
+	return parallel_checkout.nr;
+}
+
+static void advance_progress_meter(void)
+{
+	if (parallel_checkout.progress) {
+		(*parallel_checkout.progress_cnt)++;
+		display_progress(parallel_checkout.progress,
+				 *parallel_checkout.progress_cnt);
+	}
+}
+
 static int handle_results(struct checkout *state)
 {
 	int ret = 0;
@@ -194,6 +211,7 @@ static int handle_results(struct checkout *state)
 			 */
 			ret |= checkout_entry_ca(pc_item->ce, &pc_item->ca,
 						 state, NULL, NULL);
+			advance_progress_meter();
 			break;
 		case PC_ITEM_PENDING:
 			have_pending = 1;
@@ -534,6 +552,9 @@ static void parse_and_save_result(const char *buffer, int len,
 	pc_item->status = res->status;
 	if (st)
 		pc_item->st = *st;
+
+	if (res->status != PC_ITEM_COLLIDED)
+		advance_progress_meter();
 }
 
 static void gather_results_from_workers(struct pc_worker *workers,
@@ -596,11 +617,16 @@ static void write_items_sequentially(struct checkout *state)
 {
 	size_t i;
 
-	for (i = 0; i < parallel_checkout.nr; i++)
-		write_pc_item(&parallel_checkout.items[i], state);
+	for (i = 0; i < parallel_checkout.nr; i++) {
+		struct parallel_checkout_item *pc_item = &parallel_checkout.items[i];
+		write_pc_item(pc_item, state);
+		if (pc_item->status != PC_ITEM_COLLIDED)
+			advance_progress_meter();
+	}
 }
 
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt)
 {
 	int ret;
 
@@ -608,6 +634,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold
 		BUG("cannot run parallel checkout: uninitialized or already running");
 
 	parallel_checkout.status = PC_RUNNING;
+	parallel_checkout.progress = progress;
+	parallel_checkout.progress_cnt = progress_cnt;
 
 	if (parallel_checkout.nr < num_workers)
 		num_workers = parallel_checkout.nr;
diff --git a/parallel-checkout.h b/parallel-checkout.h
index 2a68ab954d..80f539bcb7 100644
--- a/parallel-checkout.h
+++ b/parallel-checkout.h
@@ -5,6 +5,7 @@
 
 struct cache_entry;
 struct checkout;
+struct progress;
 
 /****************************************************************
  * Users of parallel checkout
@@ -31,13 +32,15 @@ void init_parallel_checkout(void);
  * for later write and return 0.
  */
 int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
+size_t pc_queue_size(void);
 
 /*
  * Write all the queued entries, returning 0 on success. If the number of
  * entries is smaller than the specified threshold, the operation is performed
  * sequentially.
  */
-int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold,
+			  struct progress *progress, unsigned int *progress_cnt);
 
 /****************************************************************
  * Interface with checkout--worker
diff --git a/unpack-trees.c b/unpack-trees.c
index 0669748f21..4b77e52c6b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -474,17 +474,22 @@ static int check_updates(struct unpack_trees_options *o,
 		struct cache_entry *ce = index->cache[i];
 
 		if (ce->ce_flags & CE_UPDATE) {
+			size_t last_pc_queue_size = pc_queue_size();
+
 			if (ce->ce_flags & CE_WT_REMOVE)
 				BUG("both update and delete flags are set on %s",
 				    ce->name);
-			display_progress(progress, ++cnt);
 			ce->ce_flags &= ~CE_UPDATE;
 			errs |= checkout_entry(ce, &state, NULL, NULL);
+
+			if (last_pc_queue_size == pc_queue_size())
+				display_progress(progress, ++cnt);
 		}
 	}
-	stop_progress(&progress);
 	if (pc_workers > 1)
-		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
+		errs |= run_parallel_checkout(&state, pc_workers, pc_threshold,
+					      progress, &cnt);
+	stop_progress(&progress);
 	errs |= finish_delayed_checkout(&state, NULL);
 	git_attr_set_direction(GIT_ATTR_CHECKIN);
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 5/5] parallel-checkout: add design documentation
  2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
                         ` (3 preceding siblings ...)
  2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
@ 2021-04-19 19:53       ` Matheus Tavares
  4 siblings, 0 replies; 40+ messages in thread
From: Matheus Tavares @ 2021-04-19 19:53 UTC (permalink / raw)
  To: gitster; +Cc: git, christian.couder, git

Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/Makefile                        |   1 +
 Documentation/technical/parallel-checkout.txt | 270 ++++++++++++++++++
 2 files changed, 271 insertions(+)
 create mode 100644 Documentation/technical/parallel-checkout.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a04..af236927c9 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -90,6 +90,7 @@ TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
+TECH_DOCS += technical/parallel-checkout
 TECH_DOCS += technical/partial-clone
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt
new file mode 100644
index 0000000000..e790258a1a
--- /dev/null
+++ b/Documentation/technical/parallel-checkout.txt
@@ -0,0 +1,270 @@
+Parallel Checkout Design Notes
+==============================
+
+The "Parallel Checkout" feature attempts to use multiple processes to
+parallelize the work of uncompressing the blobs, applying in-core
+filters, and writing the resulting contents to the working tree during a
+checkout operation. It can be used by all checkout-related commands,
+such as `clone`, `checkout`, `reset`, `sparse-checkout`, and others.
+
+These commands share the following basic structure:
+
+* Step 1: Read the current index file into memory.
+
+* Step 2: Modify the in-memory index based upon the command, and
+  temporarily mark all cache entries that need to be updated.
+
+* Step 3: Populate the working tree to match the new candidate index.
+  This includes iterating over all of the to-be-updated cache entries
+  and delete, create, or overwrite the associated files in the working
+  tree.
+
+* Step 4: Write the new index to disk.
+
+Step 3 is the focus of the "parallel checkout" effort described here.
+
+Sequential Implementation
+-------------------------
+
+For the purposes of discussion here, the current sequential
+implementation of Step 3 is divided in 3 parts, each one implemented in
+its own function:
+
+* Step 3a: `unpack-trees.c:check_updates()` contains a series of
+  sequential loops iterating over the `cache_entry`'s array. The main
+  loop in this function calls the Step 3b function for each of the
+  to-be-updated entries.
+
+* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
+  for file conflicts, collisions, and unsaved changes. It removes files
+  and creates leading directories as necessary. It calls the Step 3c
+  function for each entry to be written.
+
+* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
+  it if necessary, creates the file in the working tree, writes the
+  smudged contents, calls `fstat()` or `lstat()`, and updates the
+  associated `cache_entry` struct with the stat information gathered.
+
+It wouldn't be safe to perform Step 3b in parallel, as there could be
+race conditions between file creations and removals. Instead, the
+parallel checkout framework lets the sequential code handle Step 3b,
+and uses parallel workers to replace the sequential
+`entry.c:write_entry()` calls from Step 3c.
+
+Rejected Multi-Threaded Solution
+--------------------------------
+
+The most "straightforward" implementation would be to spread the set of
+to-be-updated cache entries across multiple threads. But due to the
+thread-unsafe functions in the ODB code, we would have to use locks to
+coordinate the parallel operation. An early prototype of this solution
+showed that the multi-threaded checkout would bring performance
+improvements over the sequential code, but there was still too much lock
+contention. A `perf` profiling indicated that around 20% of the runtime
+during a local Linux clone (on an SSD) was spent in locking functions.
+For this reason this approach was rejected in favor of using multiple
+child processes, which led to a better performance.
+
+Multi-Process Solution
+----------------------
+
+Parallel checkout alters the aforementioned Step 3 to use multiple
+`checkout--worker` background processes to distribute the work. The
+long-running worker processes are controlled by the foreground Git
+command using the existing run-command API.
+
+Overview
+~~~~~~~~
+
+Step 3b is only slightly altered; for each entry to be checked out, the
+main process performs the following steps:
+
+* M1: Check whether there is any untracked or unclean file in the
+  working tree which would be overwritten by this entry, and decide
+  whether to proceed (removing the file(s)) or not.
+
+* M2: Create the leading directories.
+
+* M3: Load the conversion attributes for the entry's path.
+
+* M4: Check, based on the entry's type and conversion attributes,
+  whether the entry is eligible for parallel checkout (more on this
+  later). If it is eligible, enqueue the entry and the loaded
+  attributes to later write the entry in parallel. If not, write the
+  entry right away, using the default sequential code.
+
+Note: we save the conversion attributes associated with each entry
+because the workers don't have access to the main process' index state,
+so they can't load the attributes by themselves (and the attributes are
+needed to properly smudge the entry). Additionally, this has a positive
+impact on performance as (1) we don't need to load the attributes twice
+and (2) the attributes machinery is optimized to handle paths in
+sequential order.
+
+After all entries have passed through the above steps, the main process
+checks if the number of enqueued entries is sufficient to spread among
+the workers. If not, it just writes them sequentially. Otherwise, it
+spawns the workers and distributes the queued entries uniformly in
+continuous chunks. This aims to minimize the chances of two workers
+writing to the same directory simultaneously, which could increase lock
+contention in the kernel.
+
+Then, for each assigned item, each worker:
+
+* W1: Checks if there is any non-directory file in the leading part of
+  the entry's path or if there already exists a file at the entry' path.
+  If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
+  this later).
+
+* W2: Creates the file (with O_CREAT and O_EXCL).
+
+* W3: Loads the blob into memory (inflating and delta reconstructing
+  it).
+
+* W4: Applies any required in-process filter, like end-of-line
+  conversion and re-encoding.
+
+* W5: Writes the result to the file descriptor opened at W2.
+
+* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
+  the result back to the main process, together with the end status of
+  the operation and the item's identification number.
+
+Note that, when possible, steps W3 to W5 are delegated to the streaming
+machinery, removing the need to keep the entire blob in memory.
+
+If the worker fails to read the blob or to write it to the working tree,
+it removes the created file to avoid leaving empty files behind. This is
+the *only* time a worker is allowed to remove a file.
+
+As mentioned earlier, it is the responsibility of the main process to
+remove any file that blocks the checkout operation (or abort if the
+removal(s) would cause data loss and the user didn't ask to `--force`).
+This is crucial to avoid race conditions and also to properly detect
+path collisions at Step W1.
+
+After the workers finish writing the items and sending back the required
+information, the main process handles the results in two steps:
+
+- First, it updates the in-memory index with the `lstat()` information
+  sent by the workers. (This must be done first as this information
+  might me required in the following step.)
+
+- Then it writes the items which collided on disk (i.e. items marked
+  with `PC_ITEM_COLLIDED`). More on this below.
+
+Path Collisions
+---------------
+
+Path collisions happen when two different paths correspond to the same
+entry in the file system. E.g. the paths 'a' and 'A' would collide in a
+case-insensitive file system.
+
+The sequential checkout deals with collisions in the same way that it
+deals with files that were already present in the working tree before
+checkout. Basically, it checks if the path that it wants to write
+already exists on disk, makes sure the existing file doesn't have
+unsaved data, and then overwrites it. (To be more pedantic: it deletes
+the existing file and creates the new one.) So, if there are multiple
+colliding files to be checked out, the sequential code will write each
+one of them but only the last will actually survive on disk.
+
+Parallel checkout aims to reproduce the same behavior. However, we
+cannot let the workers racily write to the same file on disk. Instead,
+the workers detect when the entry that they want to check out would
+collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
+Later, the main process can sequentially feed these entries back to
+`checkout_entry()` without the risk of race conditions. On clone, this
+also has the effect of marking the colliding entries to later emit a
+warning for the user, like the classic sequential checkout does.
+
+The workers are able to detect both collisions among the entries being
+concurrently written and collisions between a parallel-eligible entry
+and an ineligible entry. The general idea for collision detection is
+quite straightforward: for each parallel-eligible entry, the main
+process must remove all files that prevent this entry from being written
+(before enqueueing it). This includes any non-directory file in the
+leading path of the entry. Later, when a worker gets assigned the entry,
+it looks again for the non-directories files and for an already existing
+file at the entry's path. If any of these checks finds something, the
+worker knows that there was a path collision.
+
+Because parallel checkout can distinguish path collisions from the case
+where the file was already present in the working tree before checkout,
+we could alternatively choose to skip the checkout of colliding entries.
+However, each entry that doesn't get written would have NULL `lstat()`
+fields on the index. This could cause performance penalties for
+subsequent commands that need to refresh the index, as they would have
+to go to the file system to see if the entry is dirty. Thus, if we have
+N entries in a colliding group and we decide to write and `lstat()` only
+one of them, every subsequent `git-status` will have to read, convert,
+and hash the written file N - 1 times. By checking out all colliding
+entries (like the sequential code does), we only pay the overhead once,
+during checkout.
+
+Eligible Entries for Parallel Checkout
+--------------------------------------
+
+As previously mentioned, not all entries passed to `checkout_entry()`
+will be considered eligible for parallel checkout. More specifically, we
+exclude:
+
+- Symbolic links; to avoid race conditions that, in combination with
+  path collisions, could cause workers to write files at the wrong
+  place. For example, if we were to concurrently check out a symlink
+  'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
+  we could potentially end up writing the file 'A/f' at 'a/f', due to a
+  race condition.
+
+- Regular files that require external filters (either "one shot" filters
+  or long-running process filters). These filters are black-boxes to Git
+  and may have their own internal locking or non-concurrent assumptions.
+  So it might not be safe to run multiple instances in parallel.
++
+Besides, long-running filters may use the delayed checkout feature to
+postpone the return of some filtered blobs. The delayed checkout queue
+and the parallel checkout queue are not compatible and should remain
+separate.
++
+Note: regular files that only require internal filters, like end-of-line
+conversion and re-encoding, are eligible for parallel checkout.
+
+Ineligible entries are checked out by the classic sequential codepath
+*before* spawning workers.
+
+Note: submodules's files are also eligible for parallel checkout (as
+long as they don't fall into any of the excluding categories mentioned
+above). But since each submodule is checked out in its own child
+process, we don't mix the superproject's and the submodules' files in
+the same parallel checkout process or queue.
+
+The API
+-------
+
+The parallel checkout API was designed with the goal of minimizing
+changes to the current users of the checkout machinery. This means that
+they don't have to call a different function for sequential or parallel
+checkout. As already mentioned, `checkout_entry()` will automatically
+insert the given entry in the parallel checkout queue when this feature
+is enabled and the entry is eligible; otherwise, it will just write the
+entry right away, using the sequential code. In general, callers of the
+parallel checkout API should look similar to this:
+
+----------------------------------------------
+int pc_workers, pc_threshold, err = 0;
+struct checkout state;
+
+get_parallel_checkout_configs(&pc_workers, &pc_threshold);
+
+/*
+ * This check is not strictly required, but it
+ * should save some time in sequential mode.
+ */
+if (pc_workers > 1)
+	init_parallel_checkout();
+
+for (each cache_entry ce to-be-updated)
+	err |= checkout_entry(ce, &state, NULL, NULL);
+
+err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL);
+----------------------------------------------
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-04-19 19:53 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31  4:22   ` Christian Couder
2021-04-02 14:39     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31  4:32   ` Christian Couder
2021-04-02 14:42     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31  4:33   ` Christian Couder
2021-04-02 14:45     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31  5:36   ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19  3:24   ` Matheus Tavares
2021-03-19 22:58     ` Junio C Hamano
2021-03-31  5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43   ` Junio C Hamano
2021-04-17 19:57     ` Matheus Tavares Bernardino
2021-04-19  9:41     ` Christian Couder
2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19  9:36       ` Christian Couder
2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).