On Sun, Mar 27, 2022 at 02:43:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Mar 26 2022, Neeraj Singh wrote:
> 
> > On Sat, Mar 26, 2022 at 8:34 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >>
> >> On Fri, Mar 25 2022, Neeraj Singh wrote:
> >>
> >> > On Fri, Mar 25, 2022 at 5:33 PM Ævar Arnfjörð Bjarmason
> >> > <avarab@gmail.com> wrote:
> [...]
> >> > I want to make a comment about the Index here.  Syncing the index is
> >> > strictly required for the "added" level of consistency, so that we
> >> > don't lose stuff that leaves the work tree but is staged.  But my
> >> > Windows enlistment has an index that's 266MB, which would be painful
> >> > to sync even with all the optimizations.  Maybe with split-index, this
> >> > wouldn't be so bad, but I just wanted to call out that some advanced
> >> > users may really care about the configurability.
> >>
> >> So for that use-case you'd like to fsync the loose objects (if any), but
> >> not the index? So the FS will "flush" up to the index, and then queue
> >> the index for later syncing to platter?
> >>
> >>
> >> But even in that case don't the settings need to be tied to one another,
> >> because in the method=bulk sync=index && sync=!loose case wouldn't we be
> >> syncing "loose" in any case?
> >>
> >> > As Git's various database implementations improve, the fsync stuff
> >> > will hopefully be more optimal and self-tuning.  But as that happens,
> >> > Git could just start ignoring settings that lose meaning without tying
> >> > anyones hands.
> >>
> >> Yeah that would alleviate most of my concerns here, but the docs aren't
> >> saying anything like that. Since you added them & they just landed, do
> >> you mind doing a small follow-up where we e.g. say that these new
> >> settings are "EXPERIMENTAL" or whatever, and subject to drastic change?
> >
> > The doc is already pretty prescriptive.  It has this line at the end
> > of the first  paragraph:
> > "Unless you
> > have special requirements, it is recommended that you leave
> > this option empty or pick one of `committed`, `added`,
> > or `all`."
> >
> > Those values are already designed to change as Git changes.
> 
> I'm referring to the documentation as it stands not being marked as
> experimental in the sense that we might decide to re-do this to a large
> extent, i.e. something like the diff I suggested upthread in
> https://lore.kernel.org/git/220323.86fsn8ohg8.gmgdl@evledraar.gmail.com/
> 
> So yes, I agree that it e.g. clearly states that you can add a new
> core.git=foobar or whatever down the line, but it clearly doesn't
> suggest that e.g. core.fsync might have boolean semantics in some later
> version, or that the rest might simply be ignored, even if that
> e.g. means that we wouldn't sync loose objects on
> core.fsync=loose-object, as we'd just warn with a "we don't provide this
> anymore".
> 
> Or do you disagree with that? IOW I mean that we'd do something like
> this, either in docs or code:
> 
> diff --git a/config.c b/config.c
> index 3c9b6b589ab..94548566073 100644
> --- a/config.c
> +++ b/config.c
> @@ -1675,6 +1675,9 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>  	}
>  
>  	if (!strcmp(var, "core.fsync")) {
> +		if (!the_repository->settings.feature_experimental)
> +			warning(_("the '%s' configuration option is EXPERIMENTAL. opt-in to use it with feature.experimental=true"),
> +				var);
>  		if (!value)
>  			return config_error_nonbool(var);
>  		fsync_components = parse_fsync_components(var, value);
> @@ -1682,6 +1685,9 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>  	}
>  
>  	if (!strcmp(var, "core.fsyncmethod")) {
> +		if (!the_repository->settings.feature_experimental)
> +			warning(_("the '%s' configuration option is EXPERIMENTAL. opt-in to use it with feature.experimental=true"),
> +				var);
>  		if (!value)
>  			return config_error_nonbool(var);
>  		if (!strcmp(value, "fsync"))

Let's please not tie this to `feature.experimental=true`. Setting that
option has unintended sideeffects and will also change defaults which we
may not want to have in production. I don't mind adding a warning in the
docs though that the specific items which can be configured may be
subject to change in the future.

At GitLab, we've got a three-step plan:

    1. We need to migrate to `core.fsync` in the first place. In order
       to not migrate and change behaviour at the same point in time we
       already benefit from the fine-grainedness of this config because
       we can simply say `core.fsync=loose-objects` and have the same
       behaviour as before with `core.fsyncLooseObjects=true`.

    2. We'll want to enable syncing of packfiles, which I think wasn't
       previously covered by `core.fsyncLooseobjects`.

    3. We'll add `refs` to also sync loose refs to disk.

So while the end result will be the same as `committed`, having this
level of control helps us to assess the impact in a nicer way by being
able to do this step by step with feature flags.

On the other hand, many of the other parts we don't really care about.
Auxiliary metadata like the commit-graph or pack indices are data that
can in the worst case be regenerated by us, so it's not clear to me
whether it makes to also enable fsyncing those in production.

So altogether, I agree with Neeraj: having the fine-grainedness greatly
helps us to roll out changes like this and be able to pick what we deem
to be important. Personally I would be fine with explicitly pointing out
that there are two groups of this config in our docs though:

    1. The "porcelain" group: "committed", "added", "all", "none". These
       are abstract groups whose behaviour should adapt as we change
       implementations, and are those that should typically be set by a
       user, if intended.

    2. The "plumbing" or "expert" group: these are fine-grained options
       which shouldn't typically be used by Git users. They still have
       merit though in hosting environments, where requirements are
       typically a lot more specific.

We may also provide different guarantees for both groups. The first one
should definitely be stable, but we might state that the second group is
subject to change in the future.

Patrick