git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
Date: Sat, 8 Sep 2018 18:45:46 +0200	[thread overview]
Message-ID: <CACsJy8CAPcD7pHgV-KWb39cFAvL1Yn+89y5znhcOgTG8O57w_A@mail.gmail.com> (raw)
In-Reply-To: <20180906024810.8074-6-szeder.dev@gmail.com>

On Thu, Sep 6, 2018 at 4:48 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> Ever since the split index feature was introduced [1], refreshing a
> split index is prone to a variant of the classic racy git problem.
>
> Consider the following sequence of commands updating the split index
> when the shared index contains a racily clean cache entry, i.e. an
> entry whose cached stat data matches with the corresponding file in
> the worktree and the cached mtime matches that of the index:
>
>   echo "cached content" >file
>   git update-index --split-index --add file
>   echo "dirty worktree" >file    # size stays the same!
>   # ... wait ...
>   git update-index --add other-file
>
> Normally, when a non-split index is updated, then do_write_index()
> (the function responsible for writing all kinds of indexes, "regular",
> split, and shared) recognizes racily clean cache entries, and writes
> them with smudged stat data, i.e. with file size set to 0.  When
> subsequent git commands read the index, they will notice that the
> smudged stat data doesn't match with the file in the worktree, and
> then go on to check the file's content.
>
> In the above example, however, in the second 'git update-index'
> prepare_to_write_split_index() gathers all cache entries that should
> be written to the new split index.  Alas, this function never looks
> out for racily clean cache entries, and since the file's stat data in
> the worktree hasn't changed since the shared index was written, it
> won't be replaced in the new split index.  Consequently,
> do_write_index() doesn't even get this racily clean cache entry, and
> can't smudge its stat data.  Subsequent git commands will then see
> that the index has more recent mtime than the file and that the (not
> smudged) cached stat data still matches with the file in the worktree,
> and, ultimately, will erroneously consider the file clean.
>
> Modify prepare_to_write_split_index() to recognize racily clean cache
> entries, and mark them to be added to the split index.  This way
> do_write_index() will get these racily clean cache entries as well,
> and will then write them with smudged stat data to the new split
> index.

Ack. I was aware of the first half of of the racy solution but did not
pay attention to this smudging business.

I wonder if untracked cache is also racy like this. It also only has
half the racy solution because I only knew that much. I'll check this
later.

> Note that after this change if the index is split when it contains a
> racily clean cache entry, then a smudged cache entry will be written
> both to the new shared and to the new split indexes.  This doesn't
> affect regular git commands: as far as they are concerned this is just
> an entry in the split index replacing an outdated entry in the shared
> index.  It did affect a few tests in 't1700-split-index.sh', though,
> because they actually check which entries are stored in the split
> index; the previous patch made the necessary adjustments.  And racily
> clean cache entries and index splitting are rare enough to not worry
> about the resulting duplicated smudged cache entries, and the
> additional complexity required to prevent them is not worth it.

Yes. If we have to make updates (racy or not) we have to make updates,
and the version in the shared index becomes obsolete by design.
-- 
Duy

      parent reply	other threads:[~2018-09-08 16:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
2018-09-06 21:24   ` Junio C Hamano
2018-09-08 13:50   ` Duy Nguyen
2018-09-06  2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
2018-09-06  8:03   ` Ævar Arnfjörð Bjarmason
2018-09-06  2:48 ` [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem SZEDER Gábor
2018-09-06  2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-06  8:02   ` Ævar Arnfjörð Bjarmason
2018-09-06  9:15     ` SZEDER Gábor
2018-09-06  9:20       ` Ævar Arnfjörð Bjarmason
2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
2018-09-06 15:14     ` SZEDER Gábor
2018-09-06 15:26       ` Ævar Arnfjörð Bjarmason
2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
2018-09-07  3:49           ` SZEDER Gábor
2018-09-10 22:12           ` Paul-Sebastian Ungureanu
2018-09-08 16:45   ` Duy Nguyen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8CAPcD7pHgV-KWb39cFAvL1Yn+89y5znhcOgTG8O57w_A@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).