All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Paul-Sebastian Ungureanu" <ungureanupaulsebastian@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v3 5/6] split-index: don't compare stat data of entries already marked for split index
Date: Sat, 29 Sep 2018 07:36:08 +0200	[thread overview]
Message-ID: <20180929053608.GB20349@duynguyen.home> (raw)
In-Reply-To: <20180928162459.17138-6-szeder.dev@gmail.com>

On Fri, Sep 28, 2018 at 06:24:58PM +0200, SZEDER Gábor wrote:
> When unpack_trees() constructs a new index, it copies cache entries
> from the original index [1].  prepare_to_write_split_index() has to
> deal with this, and it has a dedicated code path for copied entries
> that are present in the shared index, where it compares the cached
> data in the corresponding copied and original entries.  If the cached
> data matches, then they are considered the same; if it differs, then
> the copied entry will be marked for inclusion as a replacement entry
> in the just about to be written split index by setting the
> CE_UPDATE_IN_BASE flag.
> 
> However, a cache entry already has its CE_UPDATE_IN_BASE flag set upon
> reading the split index, if the entry already has a replacement entry
> there, or upon refreshing the cached stat data, if the corresponding
> file was modified.  The state of this flag is then preserved when
> unpack_trees() copies a cache entry from the shared index.
> 
> So modify prepare_to_write_split_index() to check the copied cache
> entries' CE_UPDATE_IN_BASE flag first, and skip the thorough
> comparison of cached data if the flag is already set.

OK so this is an optimization, not a bug fix. Right?

> Note that comparing the cached data in copied and original entries in

s/cached data/cached stat data/ ? I was confused for a bit.

> the shared index might actually be entirely unnecessary.  In theory
> all code paths refreshing the cached stat data of an entry in the
> shared index should set the CE_UPDATE_IN_BASE flag in that entry, and
> unpack_trees() should preserve this flag when copying cache entries.
> This means that the cached data is only ever changed if the
> CE_UPDATE_IN_BASE flag is set as well.  Our test suite seems to
> confirm this: instrumenting the conditions in question and running the
> test suite repeatedly with 'GIT_TEST_SPLIT_INDEX=yes' showed that the
> cached data in a copied entry differs from the data in the shared
> entry only if its CE_UPDATE_IN_BASE flag is indeed set.

Yes I was probably just being paranoid (or sticking to simpler
checks). I was told that split index is computation expensive and not
doing unnecesary/expensive checks may help. But let's leave it for
later.

> +			} else {
> +				/*
> +				 * Thoroughly compare the cached data to see
> +				 * whether it should be marked for inclusion
> +				 * in the split index.
> +				 *
> +				 * This comparison might be unnecessary, as
> +				 * code paths modifying the cached data do
> +				 * set CE_UPDATE_IN_BASE as well.
> +				 */
> +				const unsigned int ondisk_flags =
> +					CE_STAGEMASK | CE_VALID |
> +					CE_EXTENDED_FLAGS;
> +				unsigned int ce_flags, base_flags, ret;
> +				ce_flags = ce->ce_flags;
> +				base_flags = base->ce_flags;
> +				/* only on-disk flags matter */
> +				ce->ce_flags   &= ondisk_flags;
> +				base->ce_flags &= ondisk_flags;
> +				ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
> +					     offsetof(struct cache_entry, name) -
> +					     offsetof(struct cache_entry, ce_stat_data));
> +				ce->ce_flags = ce_flags;
> +				base->ce_flags = base_flags;

Maybe make this block a separate function (compare_ce_content or
something). The amount of indentation is getting too high.

> +				if (ret)
> +					ce->ce_flags |= CE_UPDATE_IN_BASE;
> +			}
>  			discard_cache_entry(base);
>  			si->base->cache[ce->index - 1] = ce;
>  		}
> -- 
> 2.19.0.361.gafc87ffe72
> 

  reply	other threads:[~2018-09-29  5:36 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-27 12:44 [PATCH v2 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 1/5] split-index: add tests to demonstrate " SZEDER Gábor
2018-09-28  0:48   ` SZEDER Gábor
2018-09-28  2:40     ` SZEDER Gábor
2018-09-28 17:30     ` Junio C Hamano
2018-09-27 12:44 ` [PATCH v2 2/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 3/5] split-index: count the number of deleted entries SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 4/5] split-index: don't compare stat data of entries already marked for split index SZEDER Gábor
2018-09-27 13:43   ` SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 5/5] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-09-27 13:53 ` [PATCH v2 0/5] Fix the racy split index problem Ævar Arnfjörð Bjarmason
2018-09-27 14:23   ` SZEDER Gábor
2018-09-27 15:25     ` Ævar Arnfjörð Bjarmason
2018-09-28  6:57       ` Ævar Arnfjörð Bjarmason
2018-09-28 10:17         ` SZEDER Gábor
2018-10-08 14:54         ` Ævar Arnfjörð Bjarmason
2018-10-08 15:41           ` SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 0/6] " SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 1/6] t1700-split-index: document why FSMONITOR is disabled in this test script SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 2/6] split-index: add tests to demonstrate the racy split index problem SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 3/6] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 4/6] split-index: count the number of deleted entries SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 5/6] split-index: don't compare stat data of entries already marked for split index SZEDER Gábor
2018-09-29  5:36     ` Duy Nguyen [this message]
2018-09-29  9:14       ` SZEDER Gábor
2018-09-29 10:07         ` SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 6/6] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-09-29  5:21     ` Duy Nguyen
2018-09-29  7:57       ` SZEDER Gábor
2018-09-30 14:47   ` [PATCH v3 0/6] Fix the racy split index problem SZEDER Gábor
2018-10-05  6:15     ` Junio C Hamano
2018-10-11  9:43   ` [PATCH v4 " SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 1/6] t1700-split-index: document why FSMONITOR is disabled in this test script SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 2/6] split-index: add tests to demonstrate the racy split index problem SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 3/6] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 4/6] split-index: count the number of deleted entries SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 5/6] split-index: don't compare cached data of entries already marked for split index SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 6/6] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-10-11  9:53     ` [PATCH 7/6] split-index: BUG() when cache entry refers to non-existing shared entry SZEDER Gábor
2018-10-11 10:36     ` [PATCH v4 0/6] Fix the racy split index problem Ævar Arnfjörð Bjarmason
2018-10-11 11:38       ` SZEDER Gábor
2018-10-12  3:20       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180929053608.GB20349@duynguyen.home \
    --to=pclouds@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    --cc=t.gummerer@gmail.com \
    --cc=ungureanupaulsebastian@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.