From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Duy Nguyen" <pclouds@gmail.com>,
"Thomas Gummerer" <t.gummerer@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem
Date: Thu, 6 Sep 2018 04:48:08 +0200 [thread overview]
Message-ID: <20180906024810.8074-4-szeder.dev@gmail.com> (raw)
In-Reply-To: <20180906024810.8074-1-szeder.dev@gmail.com>
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.
Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).
The tests cover the following sequences of index splitting, updating,
and racy file modifications, with case #3 demonstrating the racy split
index problem:
1. Split the index when the worktree contains a racily clean file:
echo "cached content" >file
git update-index --split-index --add file
echo "dirty worktree" >file # size stays the same
This case already works properly. Even though the cache entry's
stat data matches with the file in the worktree, subsequent git
commands will notice that the (split) index and the file have the
same mtime, and then will go on to check the file's content.
2. Split the index when it (i.e. the not yet splitted index)
contains a racily clean cache entry, i.e. an entry whose cached
stat data matches with the corresponding file in the worktree and
the cached mtime matches that of the index:
echo "cached content" >file
git update-index --add file
echo "dirty worktree" >file
# ... wait ...
git update-index --split-index --add other-file
This case already works properly. The shared index is written by
do_write_index(), i.e. the same function that is responsible for
writing "regular" and split indexes as well. This function
cleverly notices the racily clean cache entry, and writes the
entry to the new shared index with smudged stat data, i.e. file
size set to 0. When subsequent git commands read the index, they
will notice that the smudged stat data doesn't match with the
file in the worktree, and then go on to check the file's content.
3. Update the split index when the shared index contains a racily
clean cache entry:
echo "cached content" >file
git update-index --split-index --add file
echo "dirty worktree" >file
# ... wait ...
git update-index --add other-file
This case fails due to the racy split index problem. In the
second 'git update-index' prepare_to_write_split_index() gathers
all cache entries that should be written to the new split index.
Alas, this function never looks out for racily clean cache
entries, and since the file's stat data in the worktree hasn't
changed since the shared index was written, the entry won't be
replaced in the new split index. Consequently, do_write_index()
doesn't even get this racily clean cache entry, and can't smudge
its stat data. Subsequent git commands will then see that the
index has more recent mtime than the file and that the (not
smudged) cached stat data still matches with the file in the
worktree, and, ultimately, will erroneously consider the file
clean.
4. Add a racily clean file to an already split index:
git update-index --split-index
echo "cached content" >file
git update-index --add file
echo "dirty worktree" >file
This case already works properly. After the second 'git
update-index' writes the newly added file's cache entry to the
new split index, it basically works in the same way as case #1.
5. Update the split index when it contains a racily clean cache
entry:
git update-index --split-index
echo "cached content" >file
git update-index --add file
echo "dirty worktree" >file
# ... wait ...
git update-index --add other-file
This case already works properly. Since the second 'git
update-index' writes the newly added file's cache entry to the
split index, the third 'git update-index's do_write_index() now
has a chance to look at this entry, notices its raciness, and
writes smudged stat data to the new split index.
[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
branch 'nd/split-index', 2014-07-16).
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
t/t1701-racy-split-index.sh | 164 ++++++++++++++++++++++++++++++++++++
1 file changed, 164 insertions(+)
create mode 100755 t/t1701-racy-split-index.sh
diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
new file mode 100755
index 0000000000..c92ba9ce5e
--- /dev/null
+++ b/t/t1701-racy-split-index.sh
@@ -0,0 +1,164 @@
+#!/bin/sh
+
+# This test can give false success if your machine is sufficiently
+# slow or all trials happened to happen on second boundaries.
+
+test_description='racy split index'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+ # Only split the index when the test explicitly says so.
+ sane_unset GIT_TEST_SPLIT_INDEX GIT_FSMONITOR_TEST &&
+ git config splitIndex.maxPercentChange 100 &&
+
+ echo something >other-file &&
+ # No raciness with this file.
+ test-tool chmtime =-20 other-file
+'
+
+check_cached_diff () {
+ git diff-index --patch --cached $EMPTY_TREE racy-file >diff &&
+ tail -1 diff >actual &&
+ echo "+cached content" >expect &&
+ test_cmp expect actual
+}
+
+for trial in 0 1 2 3 4
+do
+ test_expect_success "split the index when the worktree contains a racily clean file #$trial" '
+ test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+ # The next three commands must be run within the same
+ # second (so both writes to racy-file result in the same
+ # mtime) to create the interesting racy situation.
+ echo "cached content" >racy-file &&
+
+ # Update and split the index. The cache entry of
+ # racy-file will be stored in the shared index.
+ git update-index --split-index --add racy-file &&
+
+ # File size must stay the same.
+ echo "dirty worktree" >racy-file &&
+
+ check_cached_diff
+ '
+done
+
+for trial in 0 1 2 3 4
+do
+ test_expect_success "split the index when it contains a racily clean cache entry #$trial" '
+ test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+ # The next three commands must be run within the same
+ # second.
+ echo "cached content" >racy-file &&
+
+ git update-index --add racy-file &&
+
+ # File size must stay the same.
+ echo "dirty worktree" >racy-file &&
+
+ # Now wait a bit to ensure that the split index written
+ # below will get a more recent mtime than racy-file,
+ # and, consequently, subsequent git commands wont
+ # consider the entry racily clean.
+ sleep 1 &&
+
+ # Update and split the index when it contains the
+ # racily clean cache entry of racy-file; the stat data
+ # in that entry should be smudged, so the file wont
+ # appear clean for subsequent git commands.
+ git update-index --split-index --add other-file &&
+
+ check_cached_diff
+ '
+done
+
+for trial in 0 1 2 3 4
+do
+ test_expect_failure "update the split index when the shared index contains a racily clean cache entry #$trial" '
+ test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+ # The next three commands must be run within the same
+ # second.
+ echo "cached content" >racy-file &&
+
+ # Update and split the index. The cache entry of
+ # racy-file will be stored in the shared index.
+ git update-index --split-index --add racy-file &&
+
+ # File size must stay the same.
+ echo "dirty worktree" >racy-file &&
+
+ # Now wait a bit to ensure that the split index written
+ # below will get a more recent mtime than racy-file.
+ sleep 1 &&
+
+ # Update the split index when the shared index contains
+ # the racily clean cache entry of racy-file. A
+ # corresponding replacement cache entry with smudged
+ # stat data should be added to the new split index, so
+ # the file wont appear clean for subsequent git commands.
+ #
+ # Alas, such a smudged replacement entry is not added!
+ git update-index --add other-file &&
+
+ check_cached_diff
+ '
+done
+
+for trial in 0 1 2 3 4
+do
+ test_expect_success "add a racily clean file to an already split index #$trial" '
+ test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+ git update-index --split-index &&
+
+ # The next three commands must be run within the same
+ # second.
+ echo "cached content" >racy-file &&
+
+ # Update the split index. The cache entry of racy-file
+ # will be stored in the split index.
+ git update-index --add racy-file &&
+
+ # File size must stay the same.
+ echo "dirty worktree" >racy-file &&
+
+ check_cached_diff
+ '
+done
+
+for trial in 0 1 2 3 4
+do
+ test_expect_success "update the split index when it contains a racily clean cache entry #$trial" '
+ test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+ git update-index --split-index &&
+
+ # The next three commands must be run within the same
+ # second.
+ echo "cached content" >racy-file &&
+
+ # Update the split index. The cache entry of racy-file
+ # will be stored in the split index.
+ git update-index --add racy-file &&
+
+ # File size must stay the same.
+ echo "dirty worktree" >racy-file &&
+
+ # Now wait a bit to ensure that the split index written
+ # below will get a more recent mtime than racy-file.
+ sleep 1 &&
+
+ # Update the split index when it contains the racily
+ # clean cache entry of racy-file; the stat data in that
+ # entry should be smudged.
+ git update-index --add other-file &&
+
+ check_cached_diff
+ '
+done
+
+test_done
--
2.19.0.rc0.188.g56c5ee2db1
next prev parent reply other threads:[~2018-09-06 2:48 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-06 2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-06 2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
2018-09-06 21:24 ` Junio C Hamano
2018-09-08 13:50 ` Duy Nguyen
2018-09-06 2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
2018-09-06 8:03 ` Ævar Arnfjörð Bjarmason
2018-09-06 2:48 ` SZEDER Gábor [this message]
2018-09-06 2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-06 8:02 ` Ævar Arnfjörð Bjarmason
2018-09-06 9:15 ` SZEDER Gábor
2018-09-06 9:20 ` Ævar Arnfjörð Bjarmason
2018-09-06 2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
2018-09-06 10:26 ` Ævar Arnfjörð Bjarmason
2018-09-06 12:26 ` Ævar Arnfjörð Bjarmason
2018-09-06 15:14 ` SZEDER Gábor
2018-09-06 15:26 ` Ævar Arnfjörð Bjarmason
2018-09-06 17:53 ` Ævar Arnfjörð Bjarmason
2018-09-07 3:49 ` SZEDER Gábor
2018-09-10 22:12 ` Paul-Sebastian Ungureanu
2018-09-08 16:45 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180906024810.8074-4-szeder.dev@gmail.com \
--to=szeder.dev@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=t.gummerer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).