All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: peff@peff.net, avarab@gmail.com, git@jeffhostetler.com,
	jrnieder@google.com, steadmon@google.com,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 12/17] Documentation: describe split commit-graphs
Date: Wed, 08 May 2019 08:53:57 -0700 (PDT)	[thread overview]
Message-ID: <7bbe8d9150a623ea684c94d129eda1607dd32a79.1557330827.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.184.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The design for the split commit-graphs uses file names to force
a "stack" of commit-graph files. This allows incremental writes
without updating the file format.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 142 +++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index fb53341d5e..ca1661d2d8 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,6 +127,148 @@ Design Details
   helpful for these clones, anyway. The commit-graph will not be read or
   written when shallow commits are present.
 
+Split Commit Graphs
+-------------------
+
+Typically, repos grow with near-constant velocity (commits per day). Over
+time, the number of commits added by a fetch operation is much smaller than
+the number of commits in the full history. The split commit-graph feature
+allows for fast writes of new commit data without rewriting the entire
+commit history -- at least, most of the time.
+
+## File Layout
+
+A split commit-graph uses multiple files, and we use a fixed naming
+convention to organize these files. The base commit-graph file is the
+same: `$OBJDIR/info/commit-graph`. The rest of the commit-graph files have
+the format `$OBJDIR/info/commit-graphs/commit-graph-<N>` where N is a
+positive integer. The integers must start at 1 and grow sequentially
+to form a stack of files.
+
+Each `commit-graph-<N>` file has the same format as the `commit-graph`
+file, including a lexicographic list of commit ids. The only difference
+is that this list is considered to be concatenated to the list from
+the lower commit-graphs. As an example, consider this diagram of three
+files:
+
+ +-----------------------+
+ |  commit-graph-2       |
+ +-----------------------+
+	  |
+ +-----------------------+
+ |                       |
+ |  commit-graph-1       |
+ |                       |
+ +-----------------------+
+	  |
+ +-----------------------+
+ |                       |
+ |                       |
+ |                       |
+ |  commit-graph         |
+ |                       |
+ |                       |
+ |                       |
+ +-----------------------+
+
+Let X0 be the number of commits in `commit-graph`, X1 be the number
+of commits in commit-graph-1, and X2 be the number of commits in
+commit-graph-2. If a commit appears in position i in `commit-graph-2`,
+then we interpret this as being the commit in position (X0 + X1 + i),
+and that will be used as its "graph position". The commits in
+commit-graph-2 use these positions to refer to their parents, which
+may be in commit-graph-1 or commit-graph. We can navigate to an
+arbitrary commit in position j by checking its containment in the
+intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + X2).
+
+When Git reads from these files, it starts by acquiring a read handle
+on the `commit-graph` file. On success, it continues acquiring read
+handles on the `commit-graph-<N>` files in increasing order. This
+order is important for how we replace the files.
+
+## Merging commit-graph files
+
+If we only added a `commit-graph-<N>` file on every write, we would
+run into a linear search problem through many commit-graph files.
+Instead, we use a merge strategy to decide when the stack should
+collapse some number of levels.
+
+The diagram below shows such a collapse. As a set of new commits
+are added, it is determined by the merge strategy that the files
+should collapse to `commit-graph-1`. Thus, the new commits, the
+commits in `commit-graph-2` and the commits in `commit-graph-1`
+should be combined into a new `commit-graph-1` file.
+
+			    +---------------------+
+			    |                     |
+			    |    (new commits)    |
+			    |                     |
+			    +---------------------+
+			    |                     |
+ +-----------------------+  +---------------------+
+ |  commit-graph-2       |->|                     |
+ +-----------------------+  +---------------------+
+	  |                 |                     |
+ +-----------------------+  +---------------------+
+ |                       |  |                     |
+ |  commit-graph-1       |->|                     |
+ |                       |  |                     |
+ +-----------------------+  +---------------------+
+	  |                   commit-graph-1.lock
+ +-----------------------+
+ |                       |
+ |                       |
+ |                       |
+ |  commit-graph         |
+ |                       |
+ |                       |
+ |                       |
+ +-----------------------+
+
+During this process, the commits to write are combined, sorted
+and we write the contents to the `commit-graph-1.lock` file.
+When the file is flushed and ready to swap to `commit-graph-1`,
+we first unlink the files above our target file. This unlinking
+is done from the top of the stack, the reverse direction that
+another process would use to read the stack.
+
+During this time window, another process trying to read the
+commit-graph stack could read `commit-graph-1` before the swap
+but try to read `commit-graph-2` after it is unlinked. That
+process would then believe that this stack is complete, but
+will miss out on the performance benefits of the commits in
+`commit-graph-2`. For this reason, the stack above the
+`commit-graph` file should be small.
+
+## Merge Strategy
+
+When writing a set of commits that do not exist in the
+commit-graph stack of height N, we default to creating
+a new file at level N + 1. We then decide to merge
+with the Nth level if one of two conditions hold:
+
+  1. The expected file size for level N + 1 is at
+     least half the file size for level N.
+
+  2. Level N + 1 contains more than MAX_SPLIT_COMMITS
+     commits (64,0000 commits).
+
+This decision cascades down the levels: when we
+merge a level we create a new set of commits that
+then compares to the next level.
+
+The first condition bounds the number of levels
+to be logarithmic in the total number of commits.
+The second condition bounds the total number of
+commits in a `commit-graph-N` file and not in
+the `commit-graph` file, preventing significant
+performance issues when the stack merges and another
+process only partially reads the previous stack.
+
+The merge strategy values (2 for the size multiple,
+64,000 for the maximum number of commits) could be
+extracted into config settings for full flexibility.
+
 Related Links
 -------------
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
-- 
gitgitgadget


  parent reply	other threads:[~2019-05-08 15:54 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-08 15:53 [PATCH 00/17] [RFC] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 01/17] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 02/17] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 03/17] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 04/17] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 05/17] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 06/17] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 07/17] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 08/17] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 09/17] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 10/17] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 11/17] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` Derrick Stolee via GitGitGadget [this message]
2019-05-08 17:20   ` [PATCH 12/17] Documentation: describe split commit-graphs SZEDER Gábor
2019-05-08 19:00     ` Derrick Stolee
2019-05-08 20:11       ` Ævar Arnfjörð Bjarmason
2019-05-09  4:49         ` Junio C Hamano
2019-05-09 12:25           ` Derrick Stolee
2019-05-09 13:45         ` Derrick Stolee
2019-05-09 15:48           ` Ævar Arnfjörð Bjarmason
2019-05-09 17:08             ` Derrick Stolee
2019-05-09 21:45               ` Ævar Arnfjörð Bjarmason
2019-05-10 12:44                 ` Derrick Stolee
2019-05-08 15:53 ` [PATCH 13/17] commit-graph: lay groundwork for incremental files Derrick Stolee via GitGitGadget
2019-05-08 15:53 ` [PATCH 14/17] commit-graph: load split commit-graph files Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 15/17] commit-graph: write " Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 16/17] commit-graph: add --split option Derrick Stolee via GitGitGadget
2019-05-08 15:54 ` [PATCH 17/17] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
2019-05-09  8:07   ` Ævar Arnfjörð Bjarmason
2019-05-09 14:21     ` Derrick Stolee
2019-05-08 19:27 ` [PATCH 00/17] [RFC] Commit-graph: Write incremental files Ævar Arnfjörð Bjarmason
2019-05-22 19:53 ` [PATCH v2 00/11] " Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 01/11] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 02/11] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 03/11] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 05/11] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 04/11] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 06/11] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 08/11] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-05-27 11:28     ` SZEDER Gábor
2019-05-22 19:53   ` [PATCH v2 07/11] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 09/11] commit-graph: merge " Derrick Stolee via GitGitGadget
2019-05-23  0:43     ` Ævar Arnfjörð Bjarmason
2019-05-23 13:00       ` Derrick Stolee
2019-05-22 19:53   ` [PATCH v2 10/11] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-05-22 19:53   ` [PATCH v2 11/11] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-03 16:03   ` [PATCH v3 00/14] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 01/14] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-05 17:22       ` Junio C Hamano
2019-06-05 18:09         ` Derrick Stolee
2019-06-06 12:10       ` Philip Oakley
2019-06-06 17:09         ` Derrick Stolee
2019-06-06 21:59           ` Philip Oakley
2019-06-03 16:03     ` [PATCH v3 02/14] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 03/14] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 04/14] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 05/14] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 06/14] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 07/14] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 08/14] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 09/14] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 10/14] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-03 16:03     ` [PATCH v3 11/14] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 12/14] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 13/14] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-03 16:04     ` [PATCH v3 14/14] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-06 14:15     ` [PATCH v4 00/14] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 01/14] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 02/14] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-06 15:19         ` Philip Oakley
2019-06-06 21:28         ` Junio C Hamano
2019-06-07 12:44           ` Derrick Stolee
2019-06-06 14:15       ` [PATCH v4 03/14] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 04/14] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 22:20         ` Junio C Hamano
2019-06-07 12:53           ` Derrick Stolee
2019-06-06 14:15       ` [PATCH v4 05/14] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-07 18:15         ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 06/14] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-07 18:23         ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 07/14] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 08/14] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-07 21:57         ` Junio C Hamano
2019-06-11 12:51           ` Derrick Stolee
2019-06-11 19:45             ` Junio C Hamano
2019-06-06 14:15       ` [PATCH v4 09/14] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 10/14] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-06 17:00         ` Philip Oakley
2019-06-06 14:15       ` [PATCH v4 11/14] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 12/14] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-06 18:41         ` Ramsay Jones
2019-06-06 14:15       ` [PATCH v4 13/14] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-06 14:15       ` [PATCH v4 14/14] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-06 16:57       ` [PATCH v4 00/14] Commit-graph: Write incremental files Junio C Hamano
2019-06-07 12:37         ` Derrick Stolee
2019-06-07 18:38       ` [PATCH v5 00/16] " Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 01/16] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 02/16] commit-graph: prepare for " Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 03/16] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 04/16] commit-graph: load commit-graph chains Derrick Stolee via GitGitGadget
2019-06-10 21:47           ` Junio C Hamano
2019-06-10 23:41             ` Derrick Stolee
2019-06-07 18:38         ` [PATCH v5 05/16] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 07/16] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 06/16] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 08/16] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 09/16] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 10/16] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 11/16] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 12/16] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 13/16] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 14/16] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 15/16] commit-graph: test octopus merges with --split Derrick Stolee via GitGitGadget
2019-06-07 18:38         ` [PATCH v5 16/16] commit-graph: test --split across alternate without --split Derrick Stolee via GitGitGadget
2019-06-17 15:02         ` [PATCH] commit-graph: normalize commit-graph filenames Derrick Stolee
2019-06-17 15:07           ` Derrick Stolee
2019-06-17 18:07           ` [PATCH v2] " Derrick Stolee
2019-06-18 18:14         ` [PATCH v6 00/18] Commit-graph: Write incremental files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 01/18] commit-graph: document commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 03/18] commit-graph: rename commit_compare to oid_compare Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 02/18] commit-graph: prepare for commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 04/18] commit-graph: load " Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 05/18] commit-graph: add base graphs chunk Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 07/18] commit-graph: write commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 06/18] commit-graph: rearrange chunk count logic Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 08/18] commit-graph: add --split option to builtin Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 09/18] commit-graph: merge commit-graph chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 10/18] commit-graph: allow cross-alternate chains Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 11/18] commit-graph: expire commit-graph files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 13/18] commit-graph: verify chains with --shallow mode Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 12/18] commit-graph: create options for split files Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 14/18] commit-graph: clean up chains after flattened write Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 15/18] commit-graph: test octopus merges with --split Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 16/18] commit-graph: test --split across alternate without --split Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 18/18] commit-graph: test verify across alternates Derrick Stolee via GitGitGadget
2019-06-18 18:14           ` [PATCH v6 17/18] commit-graph: normalize commit-graph filenames Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bbe8d9150a623ea684c94d129eda1607dd32a79.1557330827.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@google.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.