git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Fix the racy split index problem
@ 2018-09-06  2:48 SZEDER Gábor
  2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  2:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Duy Nguyen, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor

Ever since the split index feature was introduced, refreshing a split
index is prone to a variant of the classic racy git problem, which
caused occasional failures in several random test scripts when run
with 'GIT_TEST_SPLIT_INDEX=yes'.  I won't go into details here, there
is plenty of text to be read already in the commit messages of patches
3 and 5.

RFC, because I'm not sure that the fix in patch 5 is the best fix for
this issue.  ... and because I usually don't dive this deep into core
stuff during my Git hacking adventures, so who knows what subtleties I
might have missed.


SZEDER Gábor (5):
  t1700-split-index: drop unnecessary 'grep'
  t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index
  split index: add a test to demonstrate the racy split index problem
  t1700-split-index: date back files to avoid racy situations
  split-index: smudge and add racily clean cache entries to split index

 cache.h                     |   2 +
 read-cache.c                |   2 +-
 split-index.c               |  12 ++-
 t/t0090-cache-tree.sh       |  18 ++--
 t/t1700-split-index.sh      |  51 +++++++-----
 t/t1701-racy-split-index.sh | 162 ++++++++++++++++++++++++++++++++++++
 6 files changed, 216 insertions(+), 31 deletions(-)
 create mode 100755 t/t1701-racy-split-index.sh

-- 
2.19.0.rc0.188.g56c5ee2db1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] t1700-split-index: drop unnecessary 'grep'
  2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
@ 2018-09-06  2:48 ` SZEDER Gábor
  2018-09-06 21:24   ` Junio C Hamano
  2018-09-08 13:50   ` Duy Nguyen
  2018-09-06  2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  2:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Duy Nguyen, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor

The test 'disable split index' in 't1700-split-index.sh' runs the
following pipeline:

  cmd | grep <pattern> | sed s///

Drop that 'grep' from the pipeline, and let 'sed' take over its
duties.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1700-split-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index b3b4d83eaf..be22398a85 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -57,7 +57,7 @@ test_expect_success 'disable split index' '
 	EOF
 	test_cmp ls-files.expect ls-files.actual &&
 
-	BASE=$(test-tool dump-split-index .git/index | grep "^own" | sed "s/own/base/") &&
+	BASE=$(test-tool dump-split-index .git/index | sed -n "s/^own/base/p") &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
 	cat >expect <<-EOF &&
 	not a split index
-- 
2.19.0.rc0.188.g56c5ee2db1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index
  2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
  2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
@ 2018-09-06  2:48 ` SZEDER Gábor
  2018-09-06  8:03   ` Ævar Arnfjörð Bjarmason
  2018-09-06  2:48 ` [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem SZEDER Gábor
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  2:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Duy Nguyen, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor

The test 'switching trees does not invalidate shared index' in
't0090-cache-tree.sh' is about verifying the behaviour of the split
index feature, therefore it should be in full control of when index
splitting is performed, like all the tests in 't1700-split-index.sh'.

Unset GIT_TEST_SPLIT_INDEX for this test to avoid unintended random
index splitting.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t0090-cache-tree.sh | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
index 7de40141ca..65b5152b9b 100755
--- a/t/t0090-cache-tree.sh
+++ b/t/t0090-cache-tree.sh
@@ -242,14 +242,18 @@ test_expect_success 'no phantom error when switching trees' '
 	test_must_be_empty errors
 '
 
+
 test_expect_success 'switching trees does not invalidate shared index' '
-	git update-index --split-index &&
-	>split &&
-	git add split &&
-	test-tool dump-split-index .git/index | grep -v ^own >before &&
-	git commit -m "as-is" &&
-	test-tool dump-split-index .git/index | grep -v ^own >after &&
-	test_cmp before after
+	(
+		sane_unset GIT_TEST_SPLIT_INDEX &&
+		git update-index --split-index &&
+		>split &&
+		git add split &&
+		test-tool dump-split-index .git/index | grep -v ^own >before &&
+		git commit -m "as-is" &&
+		test-tool dump-split-index .git/index | grep -v ^own >after &&
+		test_cmp before after
+	)
 '
 
 test_done
-- 
2.19.0.rc0.188.g56c5ee2db1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem
  2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
  2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
  2018-09-06  2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
@ 2018-09-06  2:48 ` SZEDER Gábor
  2018-09-06  2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
  2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
  4 siblings, 0 replies; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  2:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Duy Nguyen, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor

Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.

Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).

The tests cover the following sequences of index splitting, updating,
and racy file modifications, with case #3 demonstrating the racy split
index problem:

  1. Split the index when the worktree contains a racily clean file:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file    # size stays the same

     This case already works properly.  Even though the cache entry's
     stat data matches with the file in the worktree, subsequent git
     commands will notice that the (split) index and the file have the
     same mtime, and then will go on to check the file's content.

  2. Split the index when it (i.e. the not yet splitted index)
     contains a racily clean cache entry, i.e. an entry whose cached
     stat data matches with the corresponding file in the worktree and
     the cached mtime matches that of the index:

       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --split-index --add other-file

     This case already works properly.  The shared index is written by
     do_write_index(), i.e. the same function that is responsible for
     writing "regular" and split indexes as well.  This function
     cleverly notices the racily clean cache entry, and writes the
     entry to the new shared index with smudged stat data, i.e. file
     size set to 0.  When subsequent git commands read the index, they
     will notice that the smudged stat data doesn't match with the
     file in the worktree, and then go on to check the file's content.

  3. Update the split index when the shared index contains a racily
     clean cache entry:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case fails due to the racy split index problem.  In the
     second 'git update-index' prepare_to_write_split_index() gathers
     all cache entries that should be written to the new split index.
     Alas, this function never looks out for racily clean cache
     entries, and since the file's stat data in the worktree hasn't
     changed since the shared index was written, the entry won't be
     replaced in the new split index.  Consequently, do_write_index()
     doesn't even get this racily clean cache entry, and can't smudge
     its stat data.  Subsequent git commands will then see that the
     index has more recent mtime than the file and that the (not
     smudged) cached stat data still matches with the file in the
     worktree, and, ultimately, will erroneously consider the file
     clean.

  4. Add a racily clean file to an already split index:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file

     This case already works properly.  After the second 'git
     update-index' writes the newly added file's cache entry to the
     new split index, it basically works in the same way as case #1.

  5. Update the split index when it contains a racily clean cache
     entry:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case already works properly.  Since the second 'git
     update-index' writes the newly added file's cache entry to the
     split index, the third 'git update-index's do_write_index() now
     has a chance to look at this entry, notices its raciness, and
     writes smudged stat data to the new split index.

[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
    branch 'nd/split-index', 2014-07-16).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1701-racy-split-index.sh | 164 ++++++++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100755 t/t1701-racy-split-index.sh

diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
new file mode 100755
index 0000000000..c92ba9ce5e
--- /dev/null
+++ b/t/t1701-racy-split-index.sh
@@ -0,0 +1,164 @@
+#!/bin/sh
+
+# This test can give false success if your machine is sufficiently
+# slow or all trials happened to happen on second boundaries.
+
+test_description='racy split index'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	# Only split the index when the test explicitly says so.
+	sane_unset GIT_TEST_SPLIT_INDEX GIT_FSMONITOR_TEST &&
+	git config splitIndex.maxPercentChange 100 &&
+
+	echo something >other-file &&
+	# No raciness with this file.
+	test-tool chmtime =-20 other-file
+'
+
+check_cached_diff () {
+	git diff-index --patch --cached $EMPTY_TREE racy-file >diff &&
+	tail -1 diff >actual &&
+	echo "+cached content" >expect &&
+	test_cmp expect actual
+}
+
+for trial in 0 1 2 3 4
+do
+	test_expect_success "split the index when the worktree contains a racily clean file #$trial" '
+		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+		# The next three commands must be run within the same
+		# second (so both writes to racy-file result in the same
+		# mtime) to create the interesting racy situation.
+		echo "cached content" >racy-file &&
+
+		# Update and split the index.  The cache entry of
+		# racy-file will be stored in the shared index.
+		git update-index --split-index --add racy-file &&
+
+		# File size must stay the same.
+		echo "dirty worktree" >racy-file &&
+
+		check_cached_diff
+	'
+done
+
+for trial in 0 1 2 3 4
+do
+	test_expect_success "split the index when it contains a racily clean cache entry #$trial" '
+		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+		# The next three commands must be run within the same
+		# second.
+		echo "cached content" >racy-file &&
+
+		git update-index --add racy-file &&
+
+		# File size must stay the same.
+		echo "dirty worktree" >racy-file &&
+
+		# Now wait a bit to ensure that the split index written
+		# below will get a more recent mtime than racy-file,
+		# and, consequently, subsequent git commands wont
+		# consider the entry racily clean.
+		sleep 1 &&
+
+		# Update and split the index when it contains the
+		# racily clean cache entry of racy-file; the stat data
+		# in that entry should be smudged, so the file wont
+		# appear clean for subsequent git commands.
+		git update-index --split-index --add other-file &&
+
+		check_cached_diff
+	'
+done
+
+for trial in 0 1 2 3 4
+do
+	test_expect_failure "update the split index when the shared index contains a racily clean cache entry #$trial" '
+		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+		# The next three commands must be run within the same
+		# second.
+		echo "cached content" >racy-file &&
+
+		# Update and split the index.  The cache entry of
+		# racy-file will be stored in the shared index.
+		git update-index --split-index --add racy-file &&
+
+		# File size must stay the same.
+		echo "dirty worktree" >racy-file &&
+
+		# Now wait a bit to ensure that the split index written
+		# below will get a more recent mtime than racy-file.
+		sleep 1 &&
+
+		# Update the split index when the shared index contains
+		# the racily clean cache entry of racy-file.  A
+		# corresponding replacement cache entry with smudged
+		# stat data should be added to the new split index, so
+		# the file wont appear clean for subsequent git commands.
+		#
+		# Alas, such a smudged replacement entry is not added!
+		git update-index --add other-file &&
+
+		check_cached_diff
+	'
+done
+
+for trial in 0 1 2 3 4
+do
+	test_expect_success "add a racily clean file to an already split index #$trial" '
+		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+		git update-index --split-index &&
+
+		# The next three commands must be run within the same
+		# second.
+		echo "cached content" >racy-file &&
+
+		# Update the split index.  The cache entry of racy-file
+		# will be stored in the split index.
+		git update-index --add racy-file &&
+
+		# File size must stay the same.
+		echo "dirty worktree" >racy-file &&
+
+		check_cached_diff
+	'
+done
+
+for trial in 0 1 2 3 4
+do
+	test_expect_success "update the split index when it contains a racily clean cache entry #$trial" '
+		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
+
+		git update-index --split-index &&
+
+		# The next three commands must be run within the same
+		# second.
+		echo "cached content" >racy-file &&
+
+		# Update the split index.  The cache entry of racy-file
+		# will be stored in the split index.
+		git update-index --add racy-file &&
+
+		# File size must stay the same.
+		echo "dirty worktree" >racy-file &&
+
+		# Now wait a bit to ensure that the split index written
+		# below will get a more recent mtime than racy-file.
+		sleep 1 &&
+
+		# Update the split index when it contains the racily
+		# clean cache entry of racy-file; the stat data in that
+		# entry should be smudged.
+		git update-index --add other-file &&
+
+		check_cached_diff
+	'
+done
+
+test_done
-- 
2.19.0.rc0.188.g56c5ee2db1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations
  2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
                   ` (2 preceding siblings ...)
  2018-09-06  2:48 ` [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem SZEDER Gábor
@ 2018-09-06  2:48 ` SZEDER Gábor
  2018-09-06  8:02   ` Ævar Arnfjörð Bjarmason
  2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
  4 siblings, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  2:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Duy Nguyen, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor

't1700-split-index.sh' checks that the index was split correctly under
various circumstances and that all the different ways to turn the
split index feature on and off work correctly.  To do so, most of its
tests use 'test-tool dump-split-index' to see which files have their
cache entries in the split index.  All these tests assume that all
cache entries are written to the shared index (called "base"
throughout these tests) when a new shared index is created.  This is
an implementation detail: most git commands (basically all except 'git
update-index') don't care or know at all about split index or whether
a cache entry is stored in the split or shared index.

As demonstrated in the previous patch, refreshing a split index is
prone to a variant of the classic racy git issue.  The next patch will
fix this issue, but while doing so it will also slightly change this
behaviour: only cache entries with mtime in the past will be written
only to the newly created shared index, but racily clean cache entries
will be written to the new split index (with smudged stat data).

While this upcoming change won't at all affect any git commands, it
will violate the above mentioned assumption of 't1700's tests.  Since
these tests create or modify files and create or refresh the split
index in rapid succession, there are plenty of racily clean cache
entries to be dealt with, which will then be written to the new split
indexes, and, ultimately, will cause several tests in 't1700' to fail.

Let's prepare 't1700-split-index.sh' for this upcoming change and
modify its tests to avoid racily clean files by backdating the mtime
of any file modifications (and since a lot of tests create or modify
files, encapsulate it into a helper function).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1700-split-index.sh | 49 ++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index be22398a85..e7beef1e09 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -8,6 +8,13 @@ test_description='split index mode tests'
 sane_unset GIT_TEST_SPLIT_INDEX
 sane_unset GIT_FSMONITOR_TEST
 
+# Create a file named as $1 with content read from stdin.
+# Set the file's mtime to a few seconds in the past to avoid racy situations.
+create_file () {
+	cat >"$1" &&
+	test-tool chmtime =-5 "$1"
+}
+
 test_expect_success 'enable split index' '
 	git config splitIndex.maxPercentChange 100 &&
 	git update-index --split-index &&
@@ -31,7 +38,7 @@ test_expect_success 'enable split index' '
 '
 
 test_expect_success 'add one file' '
-	: >one &&
+	create_file one &&
 	git update-index --add one &&
 	git ls-files --stage >ls-files.actual &&
 	cat >ls-files.expect <<-EOF &&
@@ -83,7 +90,7 @@ test_expect_success 'enable split index again, "one" now belongs to base index"'
 '
 
 test_expect_success 'modify original file, base index untouched' '
-	echo modified >one &&
+	echo modified |create_file one &&
 	git update-index one &&
 	git ls-files --stage >ls-files.actual &&
 	cat >ls-files.expect <<-EOF &&
@@ -102,7 +109,7 @@ test_expect_success 'modify original file, base index untouched' '
 '
 
 test_expect_success 'add another file, which stays index' '
-	: >two &&
+	create_file two &&
 	git update-index --add two &&
 	git ls-files --stage >ls-files.actual &&
 	cat >ls-files.expect <<-EOF &&
@@ -155,7 +162,7 @@ test_expect_success 'remove file in base index' '
 '
 
 test_expect_success 'add original file back' '
-	: >one &&
+	create_file one &&
 	git update-index --add one &&
 	git ls-files --stage >ls-files.actual &&
 	cat >ls-files.expect <<-EOF &&
@@ -174,7 +181,7 @@ test_expect_success 'add original file back' '
 '
 
 test_expect_success 'add new file' '
-	: >two &&
+	create_file two &&
 	git update-index --add two &&
 	git ls-files --stage >actual &&
 	cat >expect <<-EOF &&
@@ -218,7 +225,7 @@ test_expect_success 'rev-parse --shared-index-path' '
 
 test_expect_success 'set core.splitIndex config variable to true' '
 	git config core.splitIndex true &&
-	: >three &&
+	create_file three &&
 	git update-index --add three &&
 	git ls-files --stage >ls-files.actual &&
 	cat >ls-files.expect <<-EOF &&
@@ -253,9 +260,9 @@ test_expect_success 'set core.splitIndex config variable to false' '
 	test_cmp expect actual
 '
 
-test_expect_success 'set core.splitIndex config variable to true' '
+test_expect_success 'set core.splitIndex config variable back to true' '
 	git config core.splitIndex true &&
-	: >three &&
+	create_file three &&
 	git update-index --add three &&
 	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
@@ -265,7 +272,7 @@ test_expect_success 'set core.splitIndex config variable to true' '
 	deletions:
 	EOF
 	test_cmp expect actual &&
-	: >four &&
+	create_file four &&
 	git update-index --add four &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
 	cat >expect <<-EOF &&
@@ -279,7 +286,7 @@ test_expect_success 'set core.splitIndex config variable to true' '
 
 test_expect_success 'check behavior with splitIndex.maxPercentChange unset' '
 	git config --unset splitIndex.maxPercentChange &&
-	: >five &&
+	create_file five &&
 	git update-index --add five &&
 	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
@@ -289,7 +296,7 @@ test_expect_success 'check behavior with splitIndex.maxPercentChange unset' '
 	deletions:
 	EOF
 	test_cmp expect actual &&
-	: >six &&
+	create_file six &&
 	git update-index --add six &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
 	cat >expect <<-EOF &&
@@ -303,7 +310,7 @@ test_expect_success 'check behavior with splitIndex.maxPercentChange unset' '
 
 test_expect_success 'check splitIndex.maxPercentChange set to 0' '
 	git config splitIndex.maxPercentChange 0 &&
-	: >seven &&
+	create_file seven &&
 	git update-index --add seven &&
 	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
@@ -313,7 +320,7 @@ test_expect_success 'check splitIndex.maxPercentChange set to 0' '
 	deletions:
 	EOF
 	test_cmp expect actual &&
-	: >eight &&
+	create_file eight &&
 	git update-index --add eight &&
 	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
 	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
@@ -326,17 +333,17 @@ test_expect_success 'check splitIndex.maxPercentChange set to 0' '
 '
 
 test_expect_success 'shared index files expire after 2 weeks by default' '
-	: >ten &&
+	create_file ten &&
 	git update-index --add ten &&
 	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
 	just_under_2_weeks_ago=$((5-14*86400)) &&
 	test-tool chmtime =$just_under_2_weeks_ago .git/sharedindex.* &&
-	: >eleven &&
+	create_file eleven &&
 	git update-index --add eleven &&
 	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
 	just_over_2_weeks_ago=$((-1-14*86400)) &&
 	test-tool chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
-	: >twelve &&
+	create_file twelve &&
 	git update-index --add twelve &&
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '
@@ -344,12 +351,12 @@ test_expect_success 'shared index files expire after 2 weeks by default' '
 test_expect_success 'check splitIndex.sharedIndexExpire set to 16 days' '
 	git config splitIndex.sharedIndexExpire "16.days.ago" &&
 	test-tool chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
-	: >thirteen &&
+	create_file thirteen &&
 	git update-index --add thirteen &&
 	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
 	just_over_16_days_ago=$((-1-16*86400)) &&
 	test-tool chmtime =$just_over_16_days_ago .git/sharedindex.* &&
-	: >fourteen &&
+	create_file fourteen &&
 	git update-index --add fourteen &&
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '
@@ -358,13 +365,13 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"
 	git config splitIndex.sharedIndexExpire never &&
 	just_10_years_ago=$((-365*10*86400)) &&
 	test-tool chmtime =$just_10_years_ago .git/sharedindex.* &&
-	: >fifteen &&
+	create_file fifteen &&
 	git update-index --add fifteen &&
 	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
 	git config splitIndex.sharedIndexExpire now &&
 	just_1_second_ago=-1 &&
 	test-tool chmtime =$just_1_second_ago .git/sharedindex.* &&
-	: >sixteen &&
+	create_file sixteen &&
 	git update-index --add sixteen &&
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '
@@ -379,7 +386,7 @@ do
 		# Create one new shared index file
 		git config core.sharedrepository "$mode" &&
 		git config core.splitIndex true &&
-		: >one &&
+		create_file one &&
 		git update-index --add one &&
 		echo "$modebits" >expect &&
 		test_modebits .git/index >actual &&
-- 
2.19.0.rc0.188.g56c5ee2db1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
                   ` (3 preceding siblings ...)
  2018-09-06  2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
@ 2018-09-06  2:48 ` SZEDER Gábor
  2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
                     ` (2 more replies)
  4 siblings, 3 replies; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  2:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Duy Nguyen, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor

Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.

Consider the following sequence of commands updating the split index
when the shared index contains a racily clean cache entry, i.e. an
entry whose cached stat data matches with the corresponding file in
the worktree and the cached mtime matches that of the index:

  echo "cached content" >file
  git update-index --split-index --add file
  echo "dirty worktree" >file    # size stays the same!
  # ... wait ...
  git update-index --add other-file

Normally, when a non-split index is updated, then do_write_index()
(the function responsible for writing all kinds of indexes, "regular",
split, and shared) recognizes racily clean cache entries, and writes
them with smudged stat data, i.e. with file size set to 0.  When
subsequent git commands read the index, they will notice that the
smudged stat data doesn't match with the file in the worktree, and
then go on to check the file's content.

In the above example, however, in the second 'git update-index'
prepare_to_write_split_index() gathers all cache entries that should
be written to the new split index.  Alas, this function never looks
out for racily clean cache entries, and since the file's stat data in
the worktree hasn't changed since the shared index was written, it
won't be replaced in the new split index.  Consequently,
do_write_index() doesn't even get this racily clean cache entry, and
can't smudge its stat data.  Subsequent git commands will then see
that the index has more recent mtime than the file and that the (not
smudged) cached stat data still matches with the file in the worktree,
and, ultimately, will erroneously consider the file clean.

Modify prepare_to_write_split_index() to recognize racily clean cache
entries, and mark them to be added to the split index.  This way
do_write_index() will get these racily clean cache entries as well,
and will then write them with smudged stat data to the new split
index.

Note that after this change if the index is split when it contains a
racily clean cache entry, then a smudged cache entry will be written
both to the new shared and to the new split indexes.  This doesn't
affect regular git commands: as far as they are concerned this is just
an entry in the split index replacing an outdated entry in the shared
index.  It did affect a few tests in 't1700-split-index.sh', though,
because they actually check which entries are stored in the split
index; the previous patch made the necessary adjustments.  And racily
clean cache entries and index splitting are rare enough to not worry
about the resulting duplicated smudged cache entries, and the
additional complexity required to prevent them is not worth it.

Several tests failed occasionally when the test suite was run with
'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
back to this racy split index problem, starting with those failing
more frequently, with a link to a failing Travis CI build job for
each.  The highlighted line shows when the racy file was written,
which is not always in the failing test (but note that those lines are
in the 'after failure' fold, and your browser might unhelpfully fold
it up before you could take a good look).

  t3903-stash.sh:
    https://travis-ci.org/git/git/jobs/385542084#L5858

  t4024-diff-optimize-common.sh:
    https://travis-ci.org/git/git/jobs/386531969#L3174

  t4015-diff-whitespace.sh:
    https://travis-ci.org/git/git/jobs/360797600#L8215

  t2200-add-update.sh:
    https://travis-ci.org/git/git/jobs/382543426#L3051

  t0090-cache-tree:
    https://travis-ci.org/git/git/jobs/416583010#L3679

There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and
others using 'lib-read-tree-m-3way.sh', but I couldn't confirm yet.

[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
    branch 'nd/split-index', 2014-07-16).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 cache.h                     |  2 ++
 read-cache.c                |  2 +-
 split-index.c               | 12 +++++++++++-
 t/t1701-racy-split-index.sh |  4 +---
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 4d014541ab..3f419b6c79 100644
--- a/cache.h
+++ b/cache.h
@@ -781,6 +781,8 @@ extern void *read_blob_data_from_index(const struct index_state *, const char *,
 #define CE_MATCH_REFRESH		0x10
 /* don't refresh_fsmonitor state or do stat comparison even if CE_FSMONITOR_VALID is true */
 #define CE_MATCH_IGNORE_FSMONITOR 0X20
+extern int is_racy_timestamp(const struct index_state *istate,
+			     const struct cache_entry *ce);
 extern int ie_match_stat(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
 extern int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
 
diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..8f644f68b4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -337,7 +337,7 @@ static int is_racy_stat(const struct index_state *istate,
 		);
 }
 
-static int is_racy_timestamp(const struct index_state *istate,
+int is_racy_timestamp(const struct index_state *istate,
 			     const struct cache_entry *ce)
 {
 	return (!S_ISGITLINK(ce->ce_mode) &&
diff --git a/split-index.c b/split-index.c
index 84f067e10d..b4458f31f6 100644
--- a/split-index.c
+++ b/split-index.c
@@ -220,8 +220,18 @@ void prepare_to_write_split_index(struct index_state *istate)
 			}
 			ce->ce_flags |= CE_MATCHED; /* or "shared" */
 			base = si->base->cache[ce->index - 1];
-			if (ce == base)
+			if (ce == base) {
+				/*
+				 * Racily clean cache entries must be
+				 * written to the split index, so the
+				 * subsequent do_write_index() can smudge
+				 * their stat data.
+				 */
+				if (!ce_uptodate(ce) &&
+				    is_racy_timestamp(istate, ce))
+					ce->ce_flags |= CE_UPDATE_IN_BASE;
 				continue;
+			}
 			if (ce->ce_namelen != base->ce_namelen ||
 			    strcmp(ce->name, base->name)) {
 				ce->index = 0;
diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
index c92ba9ce5e..6780d08367 100755
--- a/t/t1701-racy-split-index.sh
+++ b/t/t1701-racy-split-index.sh
@@ -77,7 +77,7 @@ done
 
 for trial in 0 1 2 3 4
 do
-	test_expect_failure "update the split index when the shared index contains a racily clean cache entry #$trial" '
+	test_expect_success "update the split index when the shared index contains a racily clean cache entry #$trial" '
 		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
 
 		# The next three commands must be run within the same
@@ -100,8 +100,6 @@ do
 		# corresponding replacement cache entry with smudged
 		# stat data should be added to the new split index, so
 		# the file wont appear clean for subsequent git commands.
-		#
-		# Alas, such a smudged replacement entry is not added!
 		git update-index --add other-file &&
 
 		check_cached_diff
-- 
2.19.0.rc0.188.g56c5ee2db1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations
  2018-09-06  2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
@ 2018-09-06  8:02   ` Ævar Arnfjörð Bjarmason
  2018-09-06  9:15     ` SZEDER Gábor
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06  8:02 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer


On Thu, Sep 06 2018, SZEDER Gábor wrote:

> +# Create a file named as $1 with content read from stdin.
> +# Set the file's mtime to a few seconds in the past to avoid racy situations.
> +create_file () {

Nit: Would be easier to read in the future as
e.g. s/create_file/create_old_file/ or something like that. I.e. so you
can see at a glance from the tests themselves what this is doing.

>  test_expect_success 'modify original file, base index untouched' '
> -	echo modified >one &&
> +	echo modified |create_file one &&

Style nit: missing whitespace after "|", i.e. should be "echo modified |
create_file one".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index
  2018-09-06  2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
@ 2018-09-06  8:03   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06  8:03 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer


On Thu, Sep 06 2018, SZEDER Gábor wrote:
>  '
>
> +

^ stray newline being added here that shouldn't be added.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations
  2018-09-06  8:02   ` Ævar Arnfjörð Bjarmason
@ 2018-09-06  9:15     ` SZEDER Gábor
  2018-09-06  9:20       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06  9:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer

On Thu, Sep 06, 2018 at 10:02:43AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Sep 06 2018, SZEDER Gábor wrote:
> 
> > +# Create a file named as $1 with content read from stdin.
> > +# Set the file's mtime to a few seconds in the past to avoid racy situations.
> > +create_file () {
> 
> Nit: Would be easier to read in the future as
> e.g. s/create_file/create_old_file/ or something like that. I.e. so you
> can see at a glance from the tests themselves what this is doing.

Perhaps; but then I'm not so sure about 'old', because that's just a
property of the resulting file...  I'm inclined to rename it to
'create_non_racy_file' to also imply "why" this function is used
instead of plain old '>file'.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations
  2018-09-06  9:15     ` SZEDER Gábor
@ 2018-09-06  9:20       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06  9:20 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer


On Thu, Sep 06 2018, SZEDER Gábor wrote:

> On Thu, Sep 06, 2018 at 10:02:43AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Sep 06 2018, SZEDER Gábor wrote:
>> 
>> > +# Create a file named as $1 with content read from stdin.
>> > +# Set the file's mtime to a few seconds in the past to avoid racy situations.
>> > +create_file () {
>> 
>> Nit: Would be easier to read in the future as
>> e.g. s/create_file/create_old_file/ or something like that. I.e. so you
>> can see at a glance from the tests themselves what this is doing.
>
> Perhaps; but then I'm not so sure about 'old', because that's just a
> property of the resulting file...  I'm inclined to rename it to
> 'create_non_racy_file' to also imply "why" this function is used
> instead of plain old '>file'.

Yeah, that's much better.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
@ 2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
  2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
  2018-09-08 16:45   ` Duy Nguyen
  2 siblings, 0 replies; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06 10:26 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer


On Thu, Sep 06 2018, SZEDER Gábor wrote:

>   t3903-stash.sh:
>     https://travis-ci.org/git/git/jobs/385542084#L5858
>
>   t4024-diff-optimize-common.sh:
>     https://travis-ci.org/git/git/jobs/386531969#L3174
>
>   t4015-diff-whitespace.sh:
>     https://travis-ci.org/git/git/jobs/360797600#L8215
>
>   t2200-add-update.sh:
>     https://travis-ci.org/git/git/jobs/382543426#L3051
>
>   t0090-cache-tree:
>     https://travis-ci.org/git/git/jobs/416583010#L3679

Nit: That last test name is missing its ".sh", as I found out when
copy/pasting all of these into a terminal :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
  2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
@ 2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
  2018-09-06 15:14     ` SZEDER Gábor
  2018-09-08 16:45   ` Duy Nguyen
  2 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06 12:26 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer


On Thu, Sep 06 2018, SZEDER Gábor wrote:

> Ever since the split index feature was introduced [1], refreshing a
> split index is prone to a variant of the classic racy git problem.
>
> Consider the following sequence of commands updating the split index
> when the shared index contains a racily clean cache entry, i.e. an
> entry whose cached stat data matches with the corresponding file in
> the worktree and the cached mtime matches that of the index:
>
>   echo "cached content" >file
>   git update-index --split-index --add file
>   echo "dirty worktree" >file    # size stays the same!
>   # ... wait ...
>   git update-index --add other-file
>
> Normally, when a non-split index is updated, then do_write_index()
> (the function responsible for writing all kinds of indexes, "regular",
> split, and shared) recognizes racily clean cache entries, and writes
> them with smudged stat data, i.e. with file size set to 0.  When
> subsequent git commands read the index, they will notice that the
> smudged stat data doesn't match with the file in the worktree, and
> then go on to check the file's content.
>
> In the above example, however, in the second 'git update-index'
> prepare_to_write_split_index() gathers all cache entries that should
> be written to the new split index.  Alas, this function never looks
> out for racily clean cache entries, and since the file's stat data in
> the worktree hasn't changed since the shared index was written, it
> won't be replaced in the new split index.  Consequently,
> do_write_index() doesn't even get this racily clean cache entry, and
> can't smudge its stat data.  Subsequent git commands will then see
> that the index has more recent mtime than the file and that the (not
> smudged) cached stat data still matches with the file in the worktree,
> and, ultimately, will erroneously consider the file clean.
>
> Modify prepare_to_write_split_index() to recognize racily clean cache
> entries, and mark them to be added to the split index.  This way
> do_write_index() will get these racily clean cache entries as well,
> and will then write them with smudged stat data to the new split
> index.
>
> Note that after this change if the index is split when it contains a
> racily clean cache entry, then a smudged cache entry will be written
> both to the new shared and to the new split indexes.  This doesn't
> affect regular git commands: as far as they are concerned this is just
> an entry in the split index replacing an outdated entry in the shared
> index.  It did affect a few tests in 't1700-split-index.sh', though,
> because they actually check which entries are stored in the split
> index; the previous patch made the necessary adjustments.  And racily
> clean cache entries and index splitting are rare enough to not worry
> about the resulting duplicated smudged cache entries, and the
> additional complexity required to prevent them is not worth it.
>
> Several tests failed occasionally when the test suite was run with
> 'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
> back to this racy split index problem, starting with those failing
> more frequently, with a link to a failing Travis CI build job for
> each.  The highlighted line shows when the racy file was written,
> which is not always in the failing test (but note that those lines are
> in the 'after failure' fold, and your browser might unhelpfully fold
> it up before you could take a good look).

Thanks for working on this. When I package up git I run the tests
under a few different modes, in the case of split index I've been
doing:

    GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77"

Since those were the ones I spotted failing under that mode, but
I still had occasional other failures, I don't have a record of
which, maybe some of these other tests you mention, maybe not.

To test how this this series improves things, I've been running
this on a 56 core CentOS 7.5 machine:

    while true; do GIT_TEST_SPLIT_INDEX=yes prove -j$(parallel --number-of-cores) t3903-stash.sh t4024-diff-optimize-common.sh t4015-diff-whitespace.sh t2200-add-update.sh t0090-cache-tree.sh && echo "OK $(date) $(git describe)" >>log2 || echo "FAIL $(date) $(git describe)" >>log2; done

While, in another window to get some load on the machine (these seem to
fail more under load):

    while true; do prove -j$(parallel --number-of-cores) t[156789]*.sh; done

The results with this series applied up to 4/5. I.e. without the actual
fix:

     92 OK v2.19.0-rc2-6-ged839bd155
      8 FAIL v2.19.0-rc2-6-ged839bd155

I.e. when running this 100 times, I got 8 failures. So 8%.

With this patch applied:

    389 OK v2.19.0-rc2-5-g05a5a13935
     11 FAIL v2.19.0-rc2-5-g05a5a13935

This time I ran the tests 400 times, and got 11 failures, i.e. a
~2.8% failure rate. I don't have a full account of what stuff
failed (this was just scrolling past in my terminal), but most
were:

    t0090-cache-tree.sh          (Wstat: 256 Tests: 21 Failed: 3)
      Failed tests:  10-12
      Non-zero exit status: 1

I.e. these tests:

    ok 10 - commit --interactive gives cache-tree on partial commit
    ok 11 - commit in child dir has cache-tree
    ok 12 - reset --hard gives cache-tree

Then I saw two of these fail, and no other failures:

    t3903-stash.sh               (Wstat: 256 Tests: 90 Failed: 1)
      Failed test:  55
      Non-zero exit status: 1

I.e. this:

    ok 55 - stash branch should not drop the stash if the apply fails

I don't have output from those under -x -v. I'm running them in a loop
now to try to make them fail like that, no luck yet, maybe one of those
options "fixes" the race condition, or I'm just unlucky.

[Side note: All of the above is just assuming that running the tests in
 a loop without GIT_TEST_SPLIT_INDEX=yes will work, but I haven't
 actually tested that, but I've never seen one of these transitory
 failures in the past without GIT_TEST_SPLIT_INDEX=yes, so I'm fairly
 sure that works]

So this definitely seems like an improvement, i.e. the transitory
failure rate is much lower now, but it looks like there's still some
race condition related to split index left to solve.

However, one thing that makes me paranoid is that without your patch I
do get failures on t3903-stash.sh, but it's a *different* failure than
the (much less likely to happen) failure after your patch. I.e. I've
only seen it fail like this before:

    t3903-stash.sh               (Wstat: 256 Tests: 90 Failed: 1)
      Failed test:  60
      Non-zero exit status: 1

That's this test, i.e. #60, not the #55 test that occasionally fails
after these patches:

    ok 60 - handle stash specification with spaces

This series doesn't change t3903-stash.sh at all, so this really is a
different failure.

You *do* modify t0090-cache-tree.sh, but the 10-12 tests failing with
your patch are earlier in the file than the new test you added, so I
believe that's a new failure as well. It could just be sampling bias /
bad luck, but I don't have a single failure for t0090-cache-tree.sh
without this patch, and with the patch it's the most common failure.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
@ 2018-09-06 15:14     ` SZEDER Gábor
  2018-09-06 15:26       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-06 15:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer, Jeff King

On Thu, Sep 06, 2018 at 02:26:49PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Sep 06 2018, SZEDER Gábor wrote:
> > Several tests failed occasionally when the test suite was run with
> > 'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
> > back to this racy split index problem, starting with those failing
> > more frequently, with a link to a failing Travis CI build job for
> > each.  The highlighted line shows when the racy file was written,
> > which is not always in the failing test (but note that those lines are
> > in the 'after failure' fold, and your browser might unhelpfully fold
> > it up before you could take a good look).
> 
> Thanks for working on this. When I package up git I run the tests
> under a few different modes, in the case of split index I've been
> doing:
> 
>     GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77"

Yeah, I noticed that you mentioned this in an unrelated thread the
other day, that's why I put you on Cc.  ... and that's why I pulled
this series from the backburner and cleaned it up for submission.
(Gah, most of these commits have an author date back in February...)

> Since those were the ones I spotted failing under that mode, but
> I still had occasional other failures, I don't have a record of
> which, maybe some of these other tests you mention, maybe not.

I poked around the Travis CI API, and managed to get the logs of all
build jobs that failed with 'GIT_TEST_SPLIT_INDEX=yes' but succeeded
without it.  Here is the list of failed test scripts, along with how
many times they failed:

      1 t0027-auto-crlf.sh
      1 t0090-cache-tree.sh
      1 t3404-rebase-interactive.sh
      1 t5520-pull.sh
      1 t5573-pull-verify-signatures.sh
      1 t5608-clone-2gb.sh
      1 t7406-submodule-update.sh
      2 t2200-add-update.sh
      2 t4002-diff-basic.sh
      2 t5504-fetch-receive-strict.sh
      3 t0028-working-tree-encoding.sh
      4 t1000-read-tree-m-3way.sh
      6 t4015-diff-whitespace.sh
     10 t4024-diff-optimize-common.sh
     17 t3030-merge-recursive.sh
     17 t3402-rebase-merge.sh
     17 t3501-revert-cherry-pick.sh
     17 t6022-merge-rename.sh
     17 t6032-merge-large-rename.sh
     17 t6034-merge-rename-nocruft.sh
     17 t6042-merge-rename-corner-cases.sh
     17 t6043-merge-rename-directories.sh
     17 t6046-merge-skip-unneeded-updates.sh
     17 t7003-filter-branch.sh
     17 t7601-merge-pull-config.sh
     23 t3903-stash.sh

I doubt that this racy split index problem is responsible for all
these failures; e.g. I looked at the failures of a few merge-related
test scripts, and these logs show that they tend to fail because of
memory corruption, i.e. with 'Aborted (core dumped)' or 'Segmentation
fault (core dumped)'.

> To test how this this series improves things, I've been running
> this on a 56 core CentOS 7.5 machine:
> 
>     while true; do GIT_TEST_SPLIT_INDEX=yes prove -j$(parallel --number-of-cores) t3903-stash.sh t4024-diff-optimize-common.sh t4015-diff-whitespace.sh t2200-add-update.sh t0090-cache-tree.sh && echo "OK $(date) $(git describe)" >>log2 || echo "FAIL $(date) $(git describe)" >>log2; done
> 
> While, in another window to get some load on the machine (these seem to
> fail more under load):
> 
>     while true; do prove -j$(parallel --number-of-cores) t[156789]*.sh; done
> 
> The results with this series applied up to 4/5. I.e. without the actual
> fix:
> 
>      92 OK v2.19.0-rc2-6-ged839bd155
>       8 FAIL v2.19.0-rc2-6-ged839bd155
> 
> I.e. when running this 100 times, I got 8 failures. So 8%.

Lucky you ;)

I could only get t3903 to fail on me, but not the others.  That was
enough to eventually track down and fix the problem (fun! ;), and then
I looked at the logs of failed git/git Travis CI build jobs to see,
what other failures might have been caused by it.

> With this patch applied:
> 
>     389 OK v2.19.0-rc2-5-g05a5a13935
>      11 FAIL v2.19.0-rc2-5-g05a5a13935
> 
> This time I ran the tests 400 times, and got 11 failures, i.e. a
> ~2.8% failure rate. I don't have a full account of what stuff
> failed (this was just scrolling past in my terminal), but most
> were:
> 
>     t0090-cache-tree.sh          (Wstat: 256 Tests: 21 Failed: 3)
>       Failed tests:  10-12
>       Non-zero exit status: 1
> 
> I.e. these tests:
> 
>     ok 10 - commit --interactive gives cache-tree on partial commit
>     ok 11 - commit in child dir has cache-tree
>     ok 12 - reset --hard gives cache-tree

But hey, 't0090 --verbose-log -x -i' just failed on me with the fix
applied while writing this email, yay!  In its logs I see the
following errors in all three tests you mention:

  error: index uses ?½þA extension, which we do not understand
  fatal: index file corrupt

Test 13 then starts with 'rm -f .git/index', and then all is well for
the remaining tests.

There was a recent discussion about a similar error starting at:

  https://public-inbox.org/git/20180901214157.hxlqmbz3fds7hsdl@ltop.local/

and leading to a fix in 6c003d6ffb (reopen_tempfile(): truncate opened
file, 2018-09-04).  I certainly hope that not my fix is buggy, but
combined with 'GIT_TEST_SPLIT_INDEX=yes' it can occasionally trigger
the same error, and Peff's (Cc'd) fix will resolve it as well.

Could you run your stress tests with Peff's fix merged with mine?
Apparetly your setup is much more capable of triggering these issues
than mine...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06 15:14     ` SZEDER Gábor
@ 2018-09-06 15:26       ` Ævar Arnfjörð Bjarmason
  2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06 15:26 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer, Jeff King


On Thu, Sep 06 2018, SZEDER Gábor wrote:

> On Thu, Sep 06, 2018 at 02:26:49PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Sep 06 2018, SZEDER Gábor wrote:
>> > Several tests failed occasionally when the test suite was run with
>> > 'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
>> > back to this racy split index problem, starting with those failing
>> > more frequently, with a link to a failing Travis CI build job for
>> > each.  The highlighted line shows when the racy file was written,
>> > which is not always in the failing test (but note that those lines are
>> > in the 'after failure' fold, and your browser might unhelpfully fold
>> > it up before you could take a good look).
>>
>> Thanks for working on this. When I package up git I run the tests
>> under a few different modes, in the case of split index I've been
>> doing:
>>
>>     GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77"
>
> Yeah, I noticed that you mentioned this in an unrelated thread the
> other day, that's why I put you on Cc.  ... and that's why I pulled
> this series from the backburner and cleaned it up for submission.
> (Gah, most of these commits have an author date back in February...)
>
>> Since those were the ones I spotted failing under that mode, but
>> I still had occasional other failures, I don't have a record of
>> which, maybe some of these other tests you mention, maybe not.
>
> I poked around the Travis CI API, and managed to get the logs of all
> build jobs that failed with 'GIT_TEST_SPLIT_INDEX=yes' but succeeded
> without it.  Here is the list of failed test scripts, along with how
> many times they failed:
>
>       1 t0027-auto-crlf.sh
>       1 t0090-cache-tree.sh
>       1 t3404-rebase-interactive.sh
>       1 t5520-pull.sh
>       1 t5573-pull-verify-signatures.sh
>       1 t5608-clone-2gb.sh
>       1 t7406-submodule-update.sh
>       2 t2200-add-update.sh
>       2 t4002-diff-basic.sh
>       2 t5504-fetch-receive-strict.sh
>       3 t0028-working-tree-encoding.sh
>       4 t1000-read-tree-m-3way.sh
>       6 t4015-diff-whitespace.sh
>      10 t4024-diff-optimize-common.sh
>      17 t3030-merge-recursive.sh
>      17 t3402-rebase-merge.sh
>      17 t3501-revert-cherry-pick.sh
>      17 t6022-merge-rename.sh
>      17 t6032-merge-large-rename.sh
>      17 t6034-merge-rename-nocruft.sh
>      17 t6042-merge-rename-corner-cases.sh
>      17 t6043-merge-rename-directories.sh
>      17 t6046-merge-skip-unneeded-updates.sh
>      17 t7003-filter-branch.sh
>      17 t7601-merge-pull-config.sh
>      23 t3903-stash.sh
>
> I doubt that this racy split index problem is responsible for all
> these failures; e.g. I looked at the failures of a few merge-related
> test scripts, and these logs show that they tend to fail because of
> memory corruption, i.e. with 'Aborted (core dumped)' or 'Segmentation
> fault (core dumped)'.
>
>> To test how this this series improves things, I've been running
>> this on a 56 core CentOS 7.5 machine:
>>
>>     while true; do GIT_TEST_SPLIT_INDEX=yes prove -j$(parallel --number-of-cores) t3903-stash.sh t4024-diff-optimize-common.sh t4015-diff-whitespace.sh t2200-add-update.sh t0090-cache-tree.sh && echo "OK $(date) $(git describe)" >>log2 || echo "FAIL $(date) $(git describe)" >>log2; done
>>
>> While, in another window to get some load on the machine (these seem to
>> fail more under load):
>>
>>     while true; do prove -j$(parallel --number-of-cores) t[156789]*.sh; done
>>
>> The results with this series applied up to 4/5. I.e. without the actual
>> fix:
>>
>>      92 OK v2.19.0-rc2-6-ged839bd155
>>       8 FAIL v2.19.0-rc2-6-ged839bd155
>>
>> I.e. when running this 100 times, I got 8 failures. So 8%.
>
> Lucky you ;)
>
> I could only get t3903 to fail on me, but not the others.  That was
> enough to eventually track down and fix the problem (fun! ;), and then
> I looked at the logs of failed git/git Travis CI build jobs to see,
> what other failures might have been caused by it.
>
>> With this patch applied:
>>
>>     389 OK v2.19.0-rc2-5-g05a5a13935
>>      11 FAIL v2.19.0-rc2-5-g05a5a13935
>>
>> This time I ran the tests 400 times, and got 11 failures, i.e. a
>> ~2.8% failure rate. I don't have a full account of what stuff
>> failed (this was just scrolling past in my terminal), but most
>> were:
>>
>>     t0090-cache-tree.sh          (Wstat: 256 Tests: 21 Failed: 3)
>>       Failed tests:  10-12
>>       Non-zero exit status: 1
>>
>> I.e. these tests:
>>
>>     ok 10 - commit --interactive gives cache-tree on partial commit
>>     ok 11 - commit in child dir has cache-tree
>>     ok 12 - reset --hard gives cache-tree
>
> But hey, 't0090 --verbose-log -x -i' just failed on me with the fix
> applied while writing this email, yay!  In its logs I see the
> following errors in all three tests you mention:
>
>   error: index uses ?½þA extension, which we do not understand
>   fatal: index file corrupt
>
> Test 13 then starts with 'rm -f .git/index', and then all is well for
> the remaining tests.
>
> There was a recent discussion about a similar error starting at:
>
>   https://public-inbox.org/git/20180901214157.hxlqmbz3fds7hsdl@ltop.local/
>
> and leading to a fix in 6c003d6ffb (reopen_tempfile(): truncate opened
> file, 2018-09-04).  I certainly hope that not my fix is buggy, but
> combined with 'GIT_TEST_SPLIT_INDEX=yes' it can occasionally trigger
> the same error, and Peff's (Cc'd) fix will resolve it as well.
>
> Could you run your stress tests with Peff's fix merged with mine?
> Apparetly your setup is much more capable of triggering these issues
> than mine...

I get the same sort of thing on t0090-cache-tree.sh with -v -x,
i.e. failures due to:

    error: index uses �)�? extension, which we do not understand
    fatal: index file corrupt

It turns out that my inability to reproduce that earlier was because I'd
forgotten to set GIT_TEST_SPLIT_INDEX=yes in the environment for that
while loop, so I wasn't testing the split index at all.

I'm now running the tests in a lop with 6c003d6ffb cherry-picked on
top. I'll report back when I have enough data to say if/how it helped.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06 15:26       ` Ævar Arnfjörð Bjarmason
@ 2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
  2018-09-07  3:49           ` SZEDER Gábor
  2018-09-10 22:12           ` Paul-Sebastian Ungureanu
  0 siblings, 2 replies; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-06 17:53 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer, Jeff King,
	Johannes Schindelin, Paul-Sebastian Ungureanu, Joel Teichroeb


On Thu, Sep 06 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Sep 06 2018, SZEDER Gábor wrote:
>
>> On Thu, Sep 06, 2018 at 02:26:49PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Thu, Sep 06 2018, SZEDER Gábor wrote:
>>> > Several tests failed occasionally when the test suite was run with
>>> > 'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
>>> > back to this racy split index problem, starting with those failing
>>> > more frequently, with a link to a failing Travis CI build job for
>>> > each.  The highlighted line shows when the racy file was written,
>>> > which is not always in the failing test (but note that those lines are
>>> > in the 'after failure' fold, and your browser might unhelpfully fold
>>> > it up before you could take a good look).
>>>
>>> Thanks for working on this. When I package up git I run the tests
>>> under a few different modes, in the case of split index I've been
>>> doing:
>>>
>>>     GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77"
>>
>> Yeah, I noticed that you mentioned this in an unrelated thread the
>> other day, that's why I put you on Cc.  ... and that's why I pulled
>> this series from the backburner and cleaned it up for submission.
>> (Gah, most of these commits have an author date back in February...)
>>
>>> Since those were the ones I spotted failing under that mode, but
>>> I still had occasional other failures, I don't have a record of
>>> which, maybe some of these other tests you mention, maybe not.
>>
>> I poked around the Travis CI API, and managed to get the logs of all
>> build jobs that failed with 'GIT_TEST_SPLIT_INDEX=yes' but succeeded
>> without it.  Here is the list of failed test scripts, along with how
>> many times they failed:
>>
>>       1 t0027-auto-crlf.sh
>>       1 t0090-cache-tree.sh
>>       1 t3404-rebase-interactive.sh
>>       1 t5520-pull.sh
>>       1 t5573-pull-verify-signatures.sh
>>       1 t5608-clone-2gb.sh
>>       1 t7406-submodule-update.sh
>>       2 t2200-add-update.sh
>>       2 t4002-diff-basic.sh
>>       2 t5504-fetch-receive-strict.sh
>>       3 t0028-working-tree-encoding.sh
>>       4 t1000-read-tree-m-3way.sh
>>       6 t4015-diff-whitespace.sh
>>      10 t4024-diff-optimize-common.sh
>>      17 t3030-merge-recursive.sh
>>      17 t3402-rebase-merge.sh
>>      17 t3501-revert-cherry-pick.sh
>>      17 t6022-merge-rename.sh
>>      17 t6032-merge-large-rename.sh
>>      17 t6034-merge-rename-nocruft.sh
>>      17 t6042-merge-rename-corner-cases.sh
>>      17 t6043-merge-rename-directories.sh
>>      17 t6046-merge-skip-unneeded-updates.sh
>>      17 t7003-filter-branch.sh
>>      17 t7601-merge-pull-config.sh
>>      23 t3903-stash.sh
>>
>> I doubt that this racy split index problem is responsible for all
>> these failures; e.g. I looked at the failures of a few merge-related
>> test scripts, and these logs show that they tend to fail because of
>> memory corruption, i.e. with 'Aborted (core dumped)' or 'Segmentation
>> fault (core dumped)'.
>>
>>> To test how this this series improves things, I've been running
>>> this on a 56 core CentOS 7.5 machine:
>>>
>>>     while true; do GIT_TEST_SPLIT_INDEX=yes prove -j$(parallel --number-of-cores) t3903-stash.sh t4024-diff-optimize-common.sh t4015-diff-whitespace.sh t2200-add-update.sh t0090-cache-tree.sh && echo "OK $(date) $(git describe)" >>log2 || echo "FAIL $(date) $(git describe)" >>log2; done
>>>
>>> While, in another window to get some load on the machine (these seem to
>>> fail more under load):
>>>
>>>     while true; do prove -j$(parallel --number-of-cores) t[156789]*.sh; done
>>>
>>> The results with this series applied up to 4/5. I.e. without the actual
>>> fix:
>>>
>>>      92 OK v2.19.0-rc2-6-ged839bd155
>>>       8 FAIL v2.19.0-rc2-6-ged839bd155
>>>
>>> I.e. when running this 100 times, I got 8 failures. So 8%.
>>
>> Lucky you ;)
>>
>> I could only get t3903 to fail on me, but not the others.  That was
>> enough to eventually track down and fix the problem (fun! ;), and then
>> I looked at the logs of failed git/git Travis CI build jobs to see,
>> what other failures might have been caused by it.
>>
>>> With this patch applied:
>>>
>>>     389 OK v2.19.0-rc2-5-g05a5a13935
>>>      11 FAIL v2.19.0-rc2-5-g05a5a13935
>>>
>>> This time I ran the tests 400 times, and got 11 failures, i.e. a
>>> ~2.8% failure rate. I don't have a full account of what stuff
>>> failed (this was just scrolling past in my terminal), but most
>>> were:
>>>
>>>     t0090-cache-tree.sh          (Wstat: 256 Tests: 21 Failed: 3)
>>>       Failed tests:  10-12
>>>       Non-zero exit status: 1
>>>
>>> I.e. these tests:
>>>
>>>     ok 10 - commit --interactive gives cache-tree on partial commit
>>>     ok 11 - commit in child dir has cache-tree
>>>     ok 12 - reset --hard gives cache-tree
>>
>> But hey, 't0090 --verbose-log -x -i' just failed on me with the fix
>> applied while writing this email, yay!  In its logs I see the
>> following errors in all three tests you mention:
>>
>>   error: index uses ?½þA extension, which we do not understand
>>   fatal: index file corrupt
>>
>> Test 13 then starts with 'rm -f .git/index', and then all is well for
>> the remaining tests.
>>
>> There was a recent discussion about a similar error starting at:
>>
>>   https://public-inbox.org/git/20180901214157.hxlqmbz3fds7hsdl@ltop.local/
>>
>> and leading to a fix in 6c003d6ffb (reopen_tempfile(): truncate opened
>> file, 2018-09-04).  I certainly hope that not my fix is buggy, but
>> combined with 'GIT_TEST_SPLIT_INDEX=yes' it can occasionally trigger
>> the same error, and Peff's (Cc'd) fix will resolve it as well.
>>
>> Could you run your stress tests with Peff's fix merged with mine?
>> Apparetly your setup is much more capable of triggering these issues
>> than mine...
>
> I get the same sort of thing on t0090-cache-tree.sh with -v -x,
> i.e. failures due to:
>
>     error: index uses �)�? extension, which we do not understand
>     fatal: index file corrupt
>
> It turns out that my inability to reproduce that earlier was because I'd
> forgotten to set GIT_TEST_SPLIT_INDEX=yes in the environment for that
> while loop, so I wasn't testing the split index at all.
>
> I'm now running the tests in a lop with 6c003d6ffb cherry-picked on
> top. I'll report back when I have enough data to say if/how it helped.

I got 436 OK runs with that and 3 failures before I gave up and ctrl+c'd
it. And the 3 failures were:

    t3903-stash.sh               (Wstat: 256 Tests: 90 Failed: 1)
      Failed test:  55
      Non-zero exit status: 1

So it's back to failing on the same test as before your patches.

I did try merging in "pu" to get git-stash in C. There I had 2 failures
and 135 OK before I gave up. The C version failed tests 5 & 20, so there
may be some new regressions in the C version related to this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] t1700-split-index: drop unnecessary 'grep'
  2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
@ 2018-09-06 21:24   ` Junio C Hamano
  2018-09-08 13:50   ` Duy Nguyen
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2018-09-06 21:24 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Duy Nguyen, Thomas Gummerer, Ævar Arnfjörð Bjarmason

SZEDER Gábor <szeder.dev@gmail.com> writes:

> The test 'disable split index' in 't1700-split-index.sh' runs the
> following pipeline:
>
>   cmd | grep <pattern> | sed s///
>
> Drop that 'grep' from the pipeline, and let 'sed' take over its
> duties.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---

Obviously good ;-)  Thanks.


>  t/t1700-split-index.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index b3b4d83eaf..be22398a85 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -57,7 +57,7 @@ test_expect_success 'disable split index' '
>  	EOF
>  	test_cmp ls-files.expect ls-files.actual &&
>  
> -	BASE=$(test-tool dump-split-index .git/index | grep "^own" | sed "s/own/base/") &&
> +	BASE=$(test-tool dump-split-index .git/index | sed -n "s/^own/base/p") &&
>  	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
>  	cat >expect <<-EOF &&
>  	not a split index

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
@ 2018-09-07  3:49           ` SZEDER Gábor
  2018-09-10 22:12           ` Paul-Sebastian Ungureanu
  1 sibling, 0 replies; 20+ messages in thread
From: SZEDER Gábor @ 2018-09-07  3:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer, Jeff King,
	Johannes Schindelin, Paul-Sebastian Ungureanu, Joel Teichroeb

On Thu, Sep 06, 2018 at 07:53:41PM +0200, Ævar Arnfjörð Bjarmason wrote:
> I got 436 OK runs with that and 3 failures before I gave up and ctrl+c'd
> it. And the 3 failures were:
> 
>     t3903-stash.sh               (Wstat: 256 Tests: 90 Failed: 1)
>       Failed test:  55
>       Non-zero exit status: 1
> 
> So it's back to failing on the same test as before your patches.

Ah, what a pity :)

Luckily, the tests in t3903 are mostly self-contained, and copying
test #55 into a dedicated test script still works.  Then running it
repeatedly is much faster than running the whole t3903, and failed in
a reasonable amount of time.  I could then eventually narrow it down
to the diff below, which fails rather reliably, in fact I've yet to
see it succeed.  Unfortunately, 'git stash' is rather busy with index
operations, and I couldn't figure out yet what exactly goes wrong or
how to turn it into a proper test.


diff --git a/git-stash.sh b/git-stash.sh
index 94793c1a91..04fa2a4f43 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -328,6 +328,7 @@ push_stash () {
 			git diff-index -p --cached --binary HEAD -- "$@" |
 			git apply --index -R
 		else
+			GIT_TEST_SPLIT_INDEX=yes \
 			git reset --hard -q
 		fi
 
@@ -671,6 +672,7 @@ apply_to_branch () {
 	set -- --index "$@"
 	assert_stash_like "$@"
 
+	sleep 1
 	git checkout -b $branch $REV^ &&
 	apply_stash "$@" && {
 		test -z "$IS_STASH_REF" || drop_stash "$@"
diff --git a/read-cache.c b/read-cache.c
index 8f644f68b4..9f39f29221 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2761,9 +2761,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	}
 
 	if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) {
-		int v = si->base_oid.hash[0];
-		if ((v & 15) < 6)
-			istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		/* always split, to make it more deterministic */
+		istate->cache_changed |= SPLIT_INDEX_ORDERED;
 	}
 	if (too_many_not_shared_entries(istate))
 		istate->cache_changed |= SPLIT_INDEX_ORDERED;
diff --git a/t/t9999-stash-vs-split-index.sh b/t/t9999-stash-vs-split-index.sh
new file mode 100755
index 0000000000..bc2b7ccf6b
--- /dev/null
+++ b/t/t9999-stash-vs-split-index.sh
@@ -0,0 +1,20 @@
+test_description='stash vs. split index'
+
+. ./test-lib.sh
+
+test_expect_success 'stash vs. split index' '
+	sane_unset GIT_TEST_SPLIT_INDEX &&
+	git config splitIndex.maxPercentChange 100 &&
+
+	echo foo >file &&
+	GIT_TEST_SPLIT_INDEX=yes git add file &&
+	git commit -m initial &&
+	echo bar >file &&
+	git stash &&
+	echo baz >file &&
+	test_when_finished "git checkout master" &&
+	test_must_fail git stash branch new_branch stash@{0} &&
+	git rev-parse stash@{0} --
+'
+
+test_done
-- 
2.19.0.rc2.140.g09cf9e37c9


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] t1700-split-index: drop unnecessary 'grep'
  2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
  2018-09-06 21:24   ` Junio C Hamano
@ 2018-09-08 13:50   ` Duy Nguyen
  1 sibling, 0 replies; 20+ messages in thread
From: Duy Nguyen @ 2018-09-08 13:50 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git Mailing List, Junio C Hamano, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason

On Thu, Sep 6, 2018 at 4:48 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> The test 'disable split index' in 't1700-split-index.sh' runs the
> following pipeline:
>
>   cmd | grep <pattern> | sed s///
>
> Drop that 'grep' from the pipeline, and let 'sed' take over its
> duties.

Years of using sed and I never realized -n could simplify this
pattern. Thank you.
-- 
Duy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
  2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
  2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
@ 2018-09-08 16:45   ` Duy Nguyen
  2 siblings, 0 replies; 20+ messages in thread
From: Duy Nguyen @ 2018-09-08 16:45 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git Mailing List, Junio C Hamano, Thomas Gummerer,
	Ævar Arnfjörð Bjarmason

On Thu, Sep 6, 2018 at 4:48 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> Ever since the split index feature was introduced [1], refreshing a
> split index is prone to a variant of the classic racy git problem.
>
> Consider the following sequence of commands updating the split index
> when the shared index contains a racily clean cache entry, i.e. an
> entry whose cached stat data matches with the corresponding file in
> the worktree and the cached mtime matches that of the index:
>
>   echo "cached content" >file
>   git update-index --split-index --add file
>   echo "dirty worktree" >file    # size stays the same!
>   # ... wait ...
>   git update-index --add other-file
>
> Normally, when a non-split index is updated, then do_write_index()
> (the function responsible for writing all kinds of indexes, "regular",
> split, and shared) recognizes racily clean cache entries, and writes
> them with smudged stat data, i.e. with file size set to 0.  When
> subsequent git commands read the index, they will notice that the
> smudged stat data doesn't match with the file in the worktree, and
> then go on to check the file's content.
>
> In the above example, however, in the second 'git update-index'
> prepare_to_write_split_index() gathers all cache entries that should
> be written to the new split index.  Alas, this function never looks
> out for racily clean cache entries, and since the file's stat data in
> the worktree hasn't changed since the shared index was written, it
> won't be replaced in the new split index.  Consequently,
> do_write_index() doesn't even get this racily clean cache entry, and
> can't smudge its stat data.  Subsequent git commands will then see
> that the index has more recent mtime than the file and that the (not
> smudged) cached stat data still matches with the file in the worktree,
> and, ultimately, will erroneously consider the file clean.
>
> Modify prepare_to_write_split_index() to recognize racily clean cache
> entries, and mark them to be added to the split index.  This way
> do_write_index() will get these racily clean cache entries as well,
> and will then write them with smudged stat data to the new split
> index.

Ack. I was aware of the first half of of the racy solution but did not
pay attention to this smudging business.

I wonder if untracked cache is also racy like this. It also only has
half the racy solution because I only knew that much. I'll check this
later.

> Note that after this change if the index is split when it contains a
> racily clean cache entry, then a smudged cache entry will be written
> both to the new shared and to the new split indexes.  This doesn't
> affect regular git commands: as far as they are concerned this is just
> an entry in the split index replacing an outdated entry in the shared
> index.  It did affect a few tests in 't1700-split-index.sh', though,
> because they actually check which entries are stored in the split
> index; the previous patch made the necessary adjustments.  And racily
> clean cache entries and index splitting are rare enough to not worry
> about the resulting duplicated smudged cache entries, and the
> additional complexity required to prevent them is not worth it.

Yes. If we have to make updates (racy or not) we have to make updates,
and the version in the shared index becomes obsolete by design.
-- 
Duy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
  2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
  2018-09-07  3:49           ` SZEDER Gábor
@ 2018-09-10 22:12           ` Paul-Sebastian Ungureanu
  1 sibling, 0 replies; 20+ messages in thread
From: Paul-Sebastian Ungureanu @ 2018-09-10 22:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, SZEDER Gábor
  Cc: git, Junio C Hamano, Duy Nguyen, Thomas Gummerer, Jeff King,
	Johannes Schindelin, Joel Teichroeb

Hello,

On 06.09.2018 20:53, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Sep 06 2018, Ævar Arnfjörð Bjarmason wrote:
> 
>> On Thu, Sep 06 2018, SZEDER Gábor wrote:
>>
>>> On Thu, Sep 06, 2018 at 02:26:49PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> On Thu, Sep 06 2018, SZEDER Gábor wrote:
>>>>> Several tests failed occasionally when the test suite was run with
>>>>> 'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
>>>>> back to this racy split index problem, starting with those failing
>>>>> more frequently, with a link to a failing Travis CI build job for
>>>>> each.  The highlighted line shows when the racy file was written,
>>>>> which is not always in the failing test (but note that those lines are
>>>>> in the 'after failure' fold, and your browser might unhelpfully fold
>>>>> it up before you could take a good look).
>>>>
>>>> Thanks for working on this. When I package up git I run the tests
>>>> under a few different modes, in the case of split index I've been
>>>> doing:
>>>>
>>>>      GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77"
>>>
>>> Yeah, I noticed that you mentioned this in an unrelated thread the
>>> other day, that's why I put you on Cc.  ... and that's why I pulled
>>> this series from the backburner and cleaned it up for submission.
>>> (Gah, most of these commits have an author date back in February...)
>>>
>>>> Since those were the ones I spotted failing under that mode, but
>>>> I still had occasional other failures, I don't have a record of
>>>> which, maybe some of these other tests you mention, maybe not.
>>>
>>> I poked around the Travis CI API, and managed to get the logs of all
>>> build jobs that failed with 'GIT_TEST_SPLIT_INDEX=yes' but succeeded
>>> without it.  Here is the list of failed test scripts, along with how
>>> many times they failed:
>>>
>>>        1 t0027-auto-crlf.sh
>>>        1 t0090-cache-tree.sh
>>>        1 t3404-rebase-interactive.sh
>>>        1 t5520-pull.sh
>>>        1 t5573-pull-verify-signatures.sh
>>>        1 t5608-clone-2gb.sh
>>>        1 t7406-submodule-update.sh
>>>        2 t2200-add-update.sh
>>>        2 t4002-diff-basic.sh
>>>        2 t5504-fetch-receive-strict.sh
>>>        3 t0028-working-tree-encoding.sh
>>>        4 t1000-read-tree-m-3way.sh
>>>        6 t4015-diff-whitespace.sh
>>>       10 t4024-diff-optimize-common.sh
>>>       17 t3030-merge-recursive.sh
>>>       17 t3402-rebase-merge.sh
>>>       17 t3501-revert-cherry-pick.sh
>>>       17 t6022-merge-rename.sh
>>>       17 t6032-merge-large-rename.sh
>>>       17 t6034-merge-rename-nocruft.sh
>>>       17 t6042-merge-rename-corner-cases.sh
>>>       17 t6043-merge-rename-directories.sh
>>>       17 t6046-merge-skip-unneeded-updates.sh
>>>       17 t7003-filter-branch.sh
>>>       17 t7601-merge-pull-config.sh
>>>       23 t3903-stash.sh
>>>
>>> I doubt that this racy split index problem is responsible for all
>>> these failures; e.g. I looked at the failures of a few merge-related
>>> test scripts, and these logs show that they tend to fail because of
>>> memory corruption, i.e. with 'Aborted (core dumped)' or 'Segmentation
>>> fault (core dumped)'.
>>>
>>>> To test how this this series improves things, I've been running
>>>> this on a 56 core CentOS 7.5 machine:
>>>>
>>>>      while true; do GIT_TEST_SPLIT_INDEX=yes prove -j$(parallel --number-of-cores) t3903-stash.sh t4024-diff-optimize-common.sh t4015-diff-whitespace.sh t2200-add-update.sh t0090-cache-tree.sh && echo "OK $(date) $(git describe)" >>log2 || echo "FAIL $(date) $(git describe)" >>log2; done
>>>>
>>>> While, in another window to get some load on the machine (these seem to
>>>> fail more under load):
>>>>
>>>>      while true; do prove -j$(parallel --number-of-cores) t[156789]*.sh; done
>>>>
>>>> The results with this series applied up to 4/5. I.e. without the actual
>>>> fix:
>>>>
>>>>       92 OK v2.19.0-rc2-6-ged839bd155
>>>>        8 FAIL v2.19.0-rc2-6-ged839bd155
>>>>
>>>> I.e. when running this 100 times, I got 8 failures. So 8%.
>>>
>>> Lucky you ;)
>>>
>>> I could only get t3903 to fail on me, but not the others.  That was
>>> enough to eventually track down and fix the problem (fun! ;), and then
>>> I looked at the logs of failed git/git Travis CI build jobs to see,
>>> what other failures might have been caused by it.
>>>
>>>> With this patch applied:
>>>>
>>>>      389 OK v2.19.0-rc2-5-g05a5a13935
>>>>       11 FAIL v2.19.0-rc2-5-g05a5a13935
>>>>
>>>> This time I ran the tests 400 times, and got 11 failures, i.e. a
>>>> ~2.8% failure rate. I don't have a full account of what stuff
>>>> failed (this was just scrolling past in my terminal), but most
>>>> were:
>>>>
>>>>      t0090-cache-tree.sh          (Wstat: 256 Tests: 21 Failed: 3)
>>>>        Failed tests:  10-12
>>>>        Non-zero exit status: 1
>>>>
>>>> I.e. these tests:
>>>>
>>>>      ok 10 - commit --interactive gives cache-tree on partial commit
>>>>      ok 11 - commit in child dir has cache-tree
>>>>      ok 12 - reset --hard gives cache-tree
>>>
>>> But hey, 't0090 --verbose-log -x -i' just failed on me with the fix
>>> applied while writing this email, yay!  In its logs I see the
>>> following errors in all three tests you mention:
>>>
>>>    error: index uses ?½þA extension, which we do not understand
>>>    fatal: index file corrupt
>>>
>>> Test 13 then starts with 'rm -f .git/index', and then all is well for
>>> the remaining tests.
>>>
>>> There was a recent discussion about a similar error starting at:
>>>
>>>    https://public-inbox.org/git/20180901214157.hxlqmbz3fds7hsdl@ltop.local/
>>>
>>> and leading to a fix in 6c003d6ffb (reopen_tempfile(): truncate opened
>>> file, 2018-09-04).  I certainly hope that not my fix is buggy, but
>>> combined with 'GIT_TEST_SPLIT_INDEX=yes' it can occasionally trigger
>>> the same error, and Peff's (Cc'd) fix will resolve it as well.
>>>
>>> Could you run your stress tests with Peff's fix merged with mine?
>>> Apparetly your setup is much more capable of triggering these issues
>>> than mine...
>>
>> I get the same sort of thing on t0090-cache-tree.sh with -v -x,
>> i.e. failures due to:
>>
>>      error: index uses �)�? extension, which we do not understand
>>      fatal: index file corrupt
>>
>> It turns out that my inability to reproduce that earlier was because I'd
>> forgotten to set GIT_TEST_SPLIT_INDEX=yes in the environment for that
>> while loop, so I wasn't testing the split index at all.
>>
>> I'm now running the tests in a lop with 6c003d6ffb cherry-picked on
>> top. I'll report back when I have enough data to say if/how it helped.
> 
> I got 436 OK runs with that and 3 failures before I gave up and ctrl+c'd
> it. And the 3 failures were:
> 
>      t3903-stash.sh               (Wstat: 256 Tests: 90 Failed: 1)
>        Failed test:  55
>        Non-zero exit status: 1
> 
> So it's back to failing on the same test as before your patches.
> 
> I did try merging in "pu" to get git-stash in C. There I had 2 failures
> and 135 OK before I gave up. The C version failed tests 5 & 20, so there
> may be some new regressions in the C version related to this.
> 

Sorry for late replay. I have been pretty busy these days.

Indeed, test 5 & 20 also causes problems for me. I tried to find the 
root cause, but, unfortunately, I couldn't.

For test 5 (`apply does not clobber working directory changes`), it 
seems that `git stash apply` works when it should fail.

When test 20 (`stash -k`) fails it acts like `-k` wasn't specified. In 
the following code path (or Github link below), it seems that `out` is 
not filled with names of modified files.

	cp1.git_cmd = 1;
	argv_array_pushl(&cp1.args, "ls-files", "-z",
			 "--modified", "--", NULL);
	add_pathspecs(&cp1.args, ps);
	if (pipe_command(&cp1, NULL, 0, &out, 0, NULL, 0)) {
		ret = -1;
		goto done;
	}

Github link: 
https://github.com/ungps/git/blob/git-stash/builtin/stash.c#L1358

I will look more into it tomorrow.

Best,
Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-09-10 22:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
2018-09-06 21:24   ` Junio C Hamano
2018-09-08 13:50   ` Duy Nguyen
2018-09-06  2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
2018-09-06  8:03   ` Ævar Arnfjörð Bjarmason
2018-09-06  2:48 ` [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem SZEDER Gábor
2018-09-06  2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-06  8:02   ` Ævar Arnfjörð Bjarmason
2018-09-06  9:15     ` SZEDER Gábor
2018-09-06  9:20       ` Ævar Arnfjörð Bjarmason
2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
2018-09-06 15:14     ` SZEDER Gábor
2018-09-06 15:26       ` Ævar Arnfjörð Bjarmason
2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
2018-09-07  3:49           ` SZEDER Gábor
2018-09-10 22:12           ` Paul-Sebastian Ungureanu
2018-09-08 16:45   ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).