From: Duy Nguyen <pclouds@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Thomas Gummerer" <t.gummerer@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Paul-Sebastian Ungureanu" <ungureanupaulsebastian@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH v3 5/6] split-index: don't compare stat data of entries already marked for split index
Date: Sat, 29 Sep 2018 07:36:08 +0200 [thread overview]
Message-ID: <20180929053608.GB20349@duynguyen.home> (raw)
In-Reply-To: <20180928162459.17138-6-szeder.dev@gmail.com>
On Fri, Sep 28, 2018 at 06:24:58PM +0200, SZEDER Gábor wrote:
> When unpack_trees() constructs a new index, it copies cache entries
> from the original index [1]. prepare_to_write_split_index() has to
> deal with this, and it has a dedicated code path for copied entries
> that are present in the shared index, where it compares the cached
> data in the corresponding copied and original entries. If the cached
> data matches, then they are considered the same; if it differs, then
> the copied entry will be marked for inclusion as a replacement entry
> in the just about to be written split index by setting the
> CE_UPDATE_IN_BASE flag.
>
> However, a cache entry already has its CE_UPDATE_IN_BASE flag set upon
> reading the split index, if the entry already has a replacement entry
> there, or upon refreshing the cached stat data, if the corresponding
> file was modified. The state of this flag is then preserved when
> unpack_trees() copies a cache entry from the shared index.
>
> So modify prepare_to_write_split_index() to check the copied cache
> entries' CE_UPDATE_IN_BASE flag first, and skip the thorough
> comparison of cached data if the flag is already set.
OK so this is an optimization, not a bug fix. Right?
> Note that comparing the cached data in copied and original entries in
s/cached data/cached stat data/ ? I was confused for a bit.
> the shared index might actually be entirely unnecessary. In theory
> all code paths refreshing the cached stat data of an entry in the
> shared index should set the CE_UPDATE_IN_BASE flag in that entry, and
> unpack_trees() should preserve this flag when copying cache entries.
> This means that the cached data is only ever changed if the
> CE_UPDATE_IN_BASE flag is set as well. Our test suite seems to
> confirm this: instrumenting the conditions in question and running the
> test suite repeatedly with 'GIT_TEST_SPLIT_INDEX=yes' showed that the
> cached data in a copied entry differs from the data in the shared
> entry only if its CE_UPDATE_IN_BASE flag is indeed set.
Yes I was probably just being paranoid (or sticking to simpler
checks). I was told that split index is computation expensive and not
doing unnecesary/expensive checks may help. But let's leave it for
later.
> + } else {
> + /*
> + * Thoroughly compare the cached data to see
> + * whether it should be marked for inclusion
> + * in the split index.
> + *
> + * This comparison might be unnecessary, as
> + * code paths modifying the cached data do
> + * set CE_UPDATE_IN_BASE as well.
> + */
> + const unsigned int ondisk_flags =
> + CE_STAGEMASK | CE_VALID |
> + CE_EXTENDED_FLAGS;
> + unsigned int ce_flags, base_flags, ret;
> + ce_flags = ce->ce_flags;
> + base_flags = base->ce_flags;
> + /* only on-disk flags matter */
> + ce->ce_flags &= ondisk_flags;
> + base->ce_flags &= ondisk_flags;
> + ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
> + offsetof(struct cache_entry, name) -
> + offsetof(struct cache_entry, ce_stat_data));
> + ce->ce_flags = ce_flags;
> + base->ce_flags = base_flags;
Maybe make this block a separate function (compare_ce_content or
something). The amount of indentation is getting too high.
> + if (ret)
> + ce->ce_flags |= CE_UPDATE_IN_BASE;
> + }
> discard_cache_entry(base);
> si->base->cache[ce->index - 1] = ce;
> }
> --
> 2.19.0.361.gafc87ffe72
>
next prev parent reply other threads:[~2018-09-29 5:36 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-27 12:44 [PATCH v2 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 1/5] split-index: add tests to demonstrate " SZEDER Gábor
2018-09-28 0:48 ` SZEDER Gábor
2018-09-28 2:40 ` SZEDER Gábor
2018-09-28 17:30 ` Junio C Hamano
2018-09-27 12:44 ` [PATCH v2 2/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 3/5] split-index: count the number of deleted entries SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 4/5] split-index: don't compare stat data of entries already marked for split index SZEDER Gábor
2018-09-27 13:43 ` SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 5/5] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-09-27 13:53 ` [PATCH v2 0/5] Fix the racy split index problem Ævar Arnfjörð Bjarmason
2018-09-27 14:23 ` SZEDER Gábor
2018-09-27 15:25 ` Ævar Arnfjörð Bjarmason
2018-09-28 6:57 ` Ævar Arnfjörð Bjarmason
2018-09-28 10:17 ` SZEDER Gábor
2018-10-08 14:54 ` Ævar Arnfjörð Bjarmason
2018-10-08 15:41 ` SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 0/6] " SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 1/6] t1700-split-index: document why FSMONITOR is disabled in this test script SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 2/6] split-index: add tests to demonstrate the racy split index problem SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 3/6] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 4/6] split-index: count the number of deleted entries SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 5/6] split-index: don't compare stat data of entries already marked for split index SZEDER Gábor
2018-09-29 5:36 ` Duy Nguyen [this message]
2018-09-29 9:14 ` SZEDER Gábor
2018-09-29 10:07 ` SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 6/6] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-09-29 5:21 ` Duy Nguyen
2018-09-29 7:57 ` SZEDER Gábor
2018-09-30 14:47 ` [PATCH v3 0/6] Fix the racy split index problem SZEDER Gábor
2018-10-05 6:15 ` Junio C Hamano
2018-10-11 9:43 ` [PATCH v4 " SZEDER Gábor
2018-10-11 9:43 ` [PATCH v4 1/6] t1700-split-index: document why FSMONITOR is disabled in this test script SZEDER Gábor
2018-10-11 9:43 ` [PATCH v4 2/6] split-index: add tests to demonstrate the racy split index problem SZEDER Gábor
2018-10-11 9:43 ` [PATCH v4 3/6] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-10-11 9:43 ` [PATCH v4 4/6] split-index: count the number of deleted entries SZEDER Gábor
2018-10-11 9:43 ` [PATCH v4 5/6] split-index: don't compare cached data of entries already marked for split index SZEDER Gábor
2018-10-11 9:43 ` [PATCH v4 6/6] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-10-11 9:53 ` [PATCH 7/6] split-index: BUG() when cache entry refers to non-existing shared entry SZEDER Gábor
2018-10-11 10:36 ` [PATCH v4 0/6] Fix the racy split index problem Ævar Arnfjörð Bjarmason
2018-10-11 11:38 ` SZEDER Gábor
2018-10-12 3:20 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180929053608.GB20349@duynguyen.home \
--to=pclouds@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=szeder.dev@gmail.com \
--cc=t.gummerer@gmail.com \
--cc=ungureanupaulsebastian@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).