From: Christian Couder <christian.couder@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: [PATCH 5/5] parallel-checkout: add design documentation
Date: Wed, 31 Mar 2021 07:36:02 +0200 [thread overview]
Message-ID: <CAP8UFD3XkU6H==pHCFLpchKrb9cNj8fFnmF4HmCG-SaYj8C-Vg@mail.gmail.com> (raw)
In-Reply-To: <0592740ec14c4ab72b8d46a7fecf1c66e7a497fd.1616015337.git.matheus.bernardino@usp.br>
On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
> +For the purposes of discussion here, the current sequential
> +implementation of Step 3 has 3 layers:
You refer to these layers as "steps" below, so you might want to use
"sub-steps" instead of "layers".
> +* Step 3a: `unpack-trees.c:check_updates()` contains a series of
> + sequential loops iterating over the `cache_entry`'s array. The main
> + loop in this function calls the next layer for each of the
Not sure what "layer" means here. Does it mean Step 3b below? In this
case I would suggest using "Step 3b" instead of "layer".
> + to-be-updated entries.
> +
> +* Step 3b: `entry.c:checkout_entry()` examines the existing working tree
> + for file conflicts, collisions, and unsaved changes. It removes files
> + and create leading directories as necessary. It calls the next layer
s/create/creates/
I guess the "next layer" is Step 3c below.
> + for each entry to be written.
> +
> +* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges
> + it if necessary, creates the file in the working tree, writes the
> + smudged contents, calls `fstat()` or `lstat()`, and updates the
> + associated `cache_entry` struct with the stat information gathered.
> +
> +It wouldn't be safe to perform Step 3b in parallel, as there could be
> +race conditions between file creations and removals. Instead, the
> +parallel checkout framework lets the sequential code handle Step 3b,
> +and use parallel workers to replace the sequential
> +`entry.c:write_entry()` calls from Step 3c.
Ok.
> +Rejected Multi-Threaded Solution
> +--------------------------------
> +
> +The most "straightforward" implementation would be to spread the set of
> +to-be-updated cache entries across multiple threads. But due to the
> +thread-unsafe functions in the ODB code, we would have to use locks to
> +coordinate the parallel operation. An early prototype of this solution
> +showed that the multi-threaded checkout would bring performance
> +improvements over the sequential code, but there was still too much lock
> +contention. A `perf` profiling indicated that around 20% of the runtime
> +during a local Linux clone (on an SSD) was spent in locking functions.
> +For this reason this approach was rejected in favor of using multiple
> +child processes, which led to a better performance.
Nice explanation.
> +Multi-Process Solution
> +----------------------
> +
> +Parallel checkout alters the aforementioned Step 3 to use multiple
> +`checkout--helper` background processes to distribute the work. The
> +long-running worker processes are controlled by the foreground Git
> +command using the existing run-command API.
> +
> +Overview
> +~~~~~~~~
> +
> +Step 3b is only slightly altered; for each entry to be checked out, the
> +main process:
Maybe: s/main process:/main process performs the following steps:/
If you apply this suggestion, you may also want the following below:
s/M1: Checks/M1: Check/
s/and decides/and decide/
s/M2: Creates/M2: Create/
...
> +* M1: Checks whether there is any untracked or unclean file in the
> + working tree which would be overwritten by this entry, and decides
> + whether to proceed (removing the file(s)) or not.
> +
> +* M2: Creates the leading directories.
> +
> +* M3: Loads the conversion attributes for the entry's path.
> +
> +* M4: Checks, based on the entry's type and conversion attributes,
> + whether the entry is eligible for parallel checkout (more on this
> + later). If it is eligible, enqueues the entry and the loaded
> + attributes to later write the entry in parallel. If not, writes the
> + entry right away, using the default sequential code.
> +
> +Note: we save the conversion attributes associated with each entry
> +because the workers don't have access to the main process' index state,
> +so they can't load the attributes by themselves (and the attributes are
> +needed to properly smudge the entry). Additionally, this has a positive
> +impact on performance as (1) we don't need to load the attributes twice
> +and (2) the attributes machinery is optimized to handle paths in
> +sequential order.
Nice!
> +After all entries have passed through the above steps, the main process
> +checks if the number of enqueued entries is sufficient to spread among
> +the workers. If not, it just writes them sequentially. Otherwise, it
> +spawns the workers and distributes the queued entries uniformly in
> +continuous chunks. This aims to minimize the chances of two workers
> +writing to the same directory simultaneously, which could increase lock
> +contention in the kernel.
> +
> +Then, for each assigned item, each worker:
> +
> +* W1: Checks if there is any non-directory file in the leading part of
> + the entry's path or if there already exists a file at the entry' path.
> + If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on
> + this later).
> +
> +* W2: Creates the file (with O_CREAT and O_EXCL).
> +
> +* W3: Loads the blob into memory (inflating and delta reconstructing
> + it).
> +
> +* W4: Filters the blob.
Not sure what "Filters" means here. Is this related to the smudge filter?
> +* W5: Writes the result to the file descriptor opened at W2.
> +
> +* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
> + the result back to the main process, together with the end status of
> + the operation and the item's identification number.
> +
> +Note that steps W3 to W5 might actually be performed together, using the
> +streaming interface.
Not sure what "performed together" means here. Does it mean by a
single function or set of functions?
> +Also note that the workers *never* remove any files. As mentioned
Maybe: s/any files/any file/
> +earlier, it is the responsibility of the main process to remove any
> +files that block the checkout operation (or abort it). This is crucial
Maybe: s/files/file/ and s/block/blocks/ and s/abort/aborts/
> +to avoid race conditions and also to properly detect path collisions at
> +Step W1.
> +
> +After the workers finish writing the items and sending back the required
> +information, the main process handles the results in two steps:
> +
> +- First, it updates the in-memory index with the `lstat()` information
> + sent by the workers. (This must be done first as this information
> + might me required in the following step.)
> +
> +- Then it writes the items which collided on disk (i.e. items marked
> + with `PC_ITEM_COLLIDED`). More on this below.
> +
> +Path Collisions
> +---------------
> +
> +Path collisions happen when two different paths correspond to the same
> +entry in the file system. E.g. the paths 'a' and 'A' would collide in a
> +case-insensitive file system.
> +
> +The sequential checkout deals with collisions in the same way that it
> +deals with files that were already present in the working tree before
> +checkout. Basically, it checks if the path that it wants to write
> +already exists on disk, makes sure the existing file doesn't have
> +unsaved data, and then overwrite it. (To be more pedantic: it deletes
s/overwrite/overwrites/
> +the existing file and creates the new one.) So, if there are multiple
> +colliding files to be checked out, the sequential code will write each
> +one of them but only the last will actually survive on disk.
> +
> +Parallel checkout aims to reproduce the same behavior. However, we
> +cannot let the workers racily write to the same file on disk. Instead,
> +the workers detect when the entry that they want to check out would
> +collide with an existing file, and mark it with `PC_ITEM_COLLIDED`.
> +Later, the main process can sequentially feed these entries back to
> +`checkout_entry()` without the risk of race conditions. On clone, this
> +also has the effect of marking the colliding entries to later emit a
> +warning for the user, like the classic sequential checkout does.
> +
> +The workers are able to detect both collisions among the entries being
> +concurrently written and collisions among parallel-eligible and
> +ineligible entries. The general idea for collision detection is quite
> +straightforward: for each parallel-eligible entry, the main process must
> +remove all files that prevent this entry from being written (before
> +enqueueing it). This includes any non-directory file in the leading path
> +of the entry. Later, when a worker gets assigned the entry, it looks
> +again for the non-directories files and for an already existent file at
Maybe: s/existent/existing/
> +the entry's path. If any of these checks finds something, the worker
> +knows that there was a path collision.
> +
> +Because parallel checkout can distinguish path collisions from the case
> +where the file was already present in the working tree before checkout,
> +we could alternatively choose to skip the checkout of colliding entries.
> +However, each entry that doesn't get written would have NULL `lstat()`
> +fields on the index. This could cause performance penalties for
> +subsequent commands that need to refresh the index, as they would have
> +to go to the file system to see if the entry is dirty. Thus, if we have
> +N entries in a colliding group and we decide to write and `lstat()` only
> +one of them, every subsequent `git-status` will have to read, convert,
> +and hash the written file N - 1 times. By checking out all colliding
> +entries (like the sequential code does), we only pay the overhead once,
> +during checkout.
> +
> +Eligible Entries for Parallel Checkout
> +--------------------------------------
> +
> +As previously mentioned, not all entries passed to `checkout_entry()`
> +will be considered eligible for parallel checkout. More specifically, we
> +exclude:
> +
> +- Symbolic links; to avoid race conditions that, in combination with
> + path collisions, could cause workers to write files at the wrong
> + place. For example, if we were to concurrently check out a symlink
> + 'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system,
> + we could potentially end up writing the file 'A/f' at 'a/f', due to a
> + race condition.
> +
> +- Regular files that require external filters (either "one shot" filters
> + or long-running process filters). These filters are black-boxes to Git
> + and may have their own internal locking or non-concurrent assumptions.
> + So it might not be safe to run multiple instances in parallel.
> ++
> +Besides, long-running filters may use the delayed checkout feature to
> +postpone the return of some filtered blobs. The delayed checkout queue
> +and the parallel checkout queue are not compatible and should remain
> +separated.
Are files that require some other internal filters eligible though?
> +Ineligible entries are checked out by the classic sequential codepath
> +*before* spawning workers.
> +
> +Note: submodules's files are also eligible for parallel checkout (as
> +long as they don't fall into the two excluding categories mentioned
> +above). But since each submodule is checked out in its own child
> +process, we don't mix the superproject's and the submodules' files in
> +the same parallel checkout process or queue.
Ok.
> +The API
> +-------
> +
> +The parallel checkout API was designed with the goal to minimize changes
> +to the current users of the checkout machinery. This means that they
> +don't have to call a different function for sequential or parallel
> +checkout. As already mentioned, `checkout_entry()` will automatically
> +insert the given entry in the parallel checkout queue when this feature
> +is enabled and the entry is eligible; otherwise, it will just write the
> +entry right away, using the sequential code. In general, callers of the
> +parallel checkout API should look similar to this:
> +
> +----------------------------------------------
> +int pc_workers, pc_threshold, err = 0;
> +struct checkout state;
> +
> +get_parallel_checkout_configs(&pc_workers, &pc_threshold);
> +
> +/*
> + * This check is not strictly required, but it
> + * should save some time in sequential mode.
> + */
It might be nice if this comment was also in front of the real code.
> +if (pc_workers > 1)
> + init_parallel_checkout();
> +
> +for (each cache_entry ce to-be-updated)
> + err |= checkout_entry(ce, &state, NULL, NULL);
> +
> +err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL);
next prev parent reply other threads:[~2021-03-31 5:37 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31 4:22 ` Christian Couder
2021-04-02 14:39 ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31 4:32 ` Christian Couder
2021-04-02 14:42 ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31 4:33 ` Christian Couder
2021-04-02 14:45 ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31 5:36 ` Christian Couder [this message]
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19 3:24 ` Matheus Tavares
2021-03-19 22:58 ` Junio C Hamano
2021-03-31 5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17 ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52 ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43 ` Junio C Hamano
2021-04-17 19:57 ` Matheus Tavares Bernardino
2021-04-19 9:41 ` Christian Couder
2021-04-19 0:14 ` [PATCH v3 " Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 0:14 ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19 9:36 ` Christian Couder
2021-04-19 19:53 ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53 ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAP8UFD3XkU6H==pHCFLpchKrb9cNj8fFnmF4HmCG-SaYj8C-Vg@mail.gmail.com' \
--to=christian.couder@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=matheus.bernardino@usp.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).