All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Tao Klerks <tao@klerks.biz>
Cc: git <git@vger.kernel.org>
Subject: Re: Untracked Cache and --untracked-files=all
Date: Tue, 22 Jun 2021 09:06:23 -0700	[thread overview]
Message-ID: <CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com> (raw)
In-Reply-To: <CAPMMpohp6+jW2C0ewfYEp3rrwbKSqGVa94LRgQDcKJvYmiANuA@mail.gmail.com>

Hi,

On Tue, Jun 22, 2021 at 2:25 AM Tao Klerks <tao@klerks.biz> wrote:
>
> Hi folks,
>
> I'm hoping for a little help understanding *intent* around a
> particular code comment in dir.c, and reaching out to the whole list

Oh, boy, dir.c.  We should probably put a simple warning at the top of
the file, something like "Abandon all hope, all ye who enter here".

> because someone (Junio?) said they consider any mail that *doesn't*
> copy the list to be spam anyway, and the original author of the
> comment in question (Duy Nguyen) is no longer active on git.
>
> Context:
> I am trying to explore how to get "--untracked-files=all" to play nice
> with the untracked cache, so that windows users using tooling that
> sets "--untracked-files=all" can benefit from the same
> orders-of-magnitude git status performance improvements as commandline
> users.
>
> There is a "naive" approach to this (store the untracked cache in the
> index file with whatever dir flags were specified/used in the
> recursive walk, and ignore/rewrite that cached data every time the
> flags change), and there is presumably a more "comprehensive" approach
> (store all the information required in the untracked cache to be able
> to satisfy requests with either set of flags - even if this is a
> little more expensive on first run).
>
> The main disadvantage of the "naive" approach is that is every time
> you flip-flop between "git status" and "git status -u", the untracked
> cache is ignored, a recursive directory walk ensues, and the untracked
> cache is rewritten to the index file for the next time you rerun,
> hopefully with the same flags. However, I would think in most
> situations flip-flopping will be less common - more commonly you're
> using a tool or workflow that ends up running the same command(s)
> repeatedly... At least, that's my thesis. I would put this "store the
> untracked cache every time dir flags change" behavior behind a config
> switch, anyway.
>
> This "naive" approach *is* rather easy to achieve - you just need to
> recreate a new "untracked" structure inside
> dir.c#validate_untracked_cache() if you find a mismatch of flags (and
> make other small fixes to store the correct flags in that new
> "untracked" structure).
>
> The one thing that sticks out after making these changes is a code
> comment first introduced in 2015 by Duy Nguyen in ccad261f, explaining
> *why* we refuse to use the untracked cache with "-uall":
> > * See treat_directory(), case index_nonexistent. Without
> > * this flag, we may need to also cache .git file content
> > * for the resolve_gitlink_ref() call, which we don't.
>
> I've seen this comment many times over the past few months, and I've
> previously always interpreted it as a "correctness" concern.
>
> Looking at the original code in question, however (as of ccad261f,
> dir.c#treat_directory(), "resolve_gitlink_ref" call), I don't
> understand how correctness could be impacted.
>
> Fast-forward 6 years and all this code has been substantially
> overhauled by several folks over the years (most recently and majorly
> Elijah Newren), and the "resolve_gitlink_ref()" call is long-gone.

I didn't remember removing any resolve_gitlink_ref() calls, but a
quick search (`git log -Sresolve_gitlink_ref -p -- dir.c`) suggests it
was removed by b22827045e0b (dir: do not traverse repositories with no
commits, 2019-04-09).  A brief look at that commit message certainly
suggests things have changed in a way that _might_ make the original
comment irrelevant.

> Does this look familiar to anyone? Is there any remaining obvious
> reason to be wary of storing the untracked cache structure produced
> with '-uall'?

Untracked cache is its own beast.  I fought with it a bunch, but it's
been a while.  (See e.g.
https://lore.kernel.org/git/CABPp-BEWhavmtDi-ahT+QMWtD6Fe-Ey7cn_82nbetWEQJL=hRA@mail.gmail.com/)

I can say that untracked traversal had all sorts of weirdness and
breakage, which could have made untracked cache harder to make right
and consistent particularly with -uall.  See for example 8d92fb292706
(dir: replace exponential algorithm with a linear one, 2020-04-01) and
perhaps aa6e1b21e5de (dir: avoid unnecessary traversal into ignored
directory, 2021-05-12).  In particular, the exponential traversal
(paths at depth N might be traversed 2^N times) from the former commit
combined with the fact that later traversals often returned a
different status for a path than the first traversal certainly could
have caused some additional weirdness for the untracked cache.

It may be that some combination of Kyle's removal of the
resolve_gitlink_ref() and my removal of the exponential traversal of
untracked paths make your idea safe...but this is dir.c and
untracked-cache, so who really knows?


Not sure if that helps, but I hope one or two pointers I provided are
somehow meaningful to you.

Elijah

  reply	other threads:[~2021-06-22 16:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22  9:25 Untracked Cache and --untracked-files=all Tao Klerks
2021-06-22 16:06 ` Elijah Newren [this message]
2021-06-23  6:42   ` Tao Klerks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.