All of lore.kernel.org
 help / color / mirror / Atom feed
From: Son Luong Ngoc <sluongng@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/2] Sparse checkout status
Date: Wed, 17 Jun 2020 19:58:50 +0200	[thread overview]
Message-ID: <20200617175850.GA57254@C02YX140LVDN.corpad.adbkng.com> (raw)
In-Reply-To: <CABPp-BHyc=aYqY+YuvNRsFsrMPL6+O=CX37jzXx38_-SXw5gLA@mail.gmail.com>

Hi Elijah,

On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> 
> Well, there is `git sparse-checkout list`, assuming users know they
> are in a sparse-checkout, but the whole point of my suggested change
> is that they sometimes don't.

Ah thats true.
This was added recently and definitely slipped my mind often.

> 
> This surprises me; I considered performance while writing it and kept
> it simple on that basis.  In particular:
>   * This does not cause any reading or writing of any extra files; it
> is done solely with information that is already loaded.
>   * If users aren't in a sparse-checkout, their performance overhead
> is a single if-check, which I doubt anyone can measure.
>   * If they are in a sparse-checkout, then they'd get one extra loop
> over files in the index to check the SKIP_WORKTREE bit.
> 
> In which cases would performance implications be a concern?  For a
> very simple point of reference, in a sparse-checkout of the linux
> kernel (using --cone mode and only selecting the drivers/ directory),
> I see the following timings for 'git status' in a clean checkout:
> 
> Without my change:
> [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      78.8 ms ±   2.8 ms    [User: 48.9 ms, System: 76.9 ms]
>   Range (min … max):    74.0 ms …  84.0 ms    38 runs
> 
> With my change:
> [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      79.8 ms ±   2.7 ms    [User: 49.3 ms, System: 77.7 ms]
>   Range (min … max):    74.8 ms …  84.5 ms    37 runs
> 
> I know the linux kernel is tiny compared to repos like Windows or
> Office, but the relative scaling considerations are identical: it's
> one extra loop through the cached entries checking a bit for each
> entry.  If people are worried about the "extra loop", I could find an
> existing loop to modify and just add an extra if-block in it so that
> we have the same number of loops.  (I'm doubtful that'd actually help,
> but if the concern is an extra loop, it'd certainly avoid that.)
> Anyway, if you've got more information about it being too costly, I'm
> happy to listen.  Otherwise, the overhead seems pretty small to me and
> it's only paid by those who would benefit from the information.
> 
> However, all that said, I have good news: Peff already implemented the
> flag users can use to avoid this extra output, and did so back in
> September of 2009.  It's called "--porcelain".  Automated commands
> should already be using it, and if they aren't, they are what needs
> fixing -- not the long form status output.

When I wrote my initial reaction, the idea of having more than just a
percentage reported back stuck in my mind, specifically with using the
in-tree checkout that I mentioned.

But yeah, that's something down the line to address, you are absolutely
correct that the current patch has no performance impact. Thanks for the
reminder about '--porcelain'.

> 
> I think having a 'git sparse-checkout status' would be a fine
> subcommand, and output like the above -- possibly also including other
> bits Stolee or I mentioned elsewhere in this thread -- would be cool
> and would be helpful; it'd complement what I'm doing here quite
> nicely.
> 
> But you're solving a related problem rather than the one I was
> focusing on, and you have left the issue I was focusing on
> unaddressed.  In particular, if users forgot that they sparsified in
> the first place, how are they going to know to run `git
> sparse-checkout status [--all]`?
> 
> I think having a simple line of output in `git status` would remind
> them.  With that reminder, they could today then go run 'git
> sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
> internally) or './sparsify --info' (as I use internally) to get more
> info.  In the future we could provide additional things for them as
> well, such as your 'git sparse-checkout status'.
> 

I do concede that this point could be a separate problem set and addressed
separately in the future.

> 
> An aside, though, since you linked to the in-tree sparse-checkout
> definitions: When I reviewed that series, the possibility of merge
> conflicts and not knowing what sparse-checkout should have checked out
> when the in-tree defintions themselves were in a conflicted state
> seemed to me to be a pretty tough sticking point.  I'm hoping someone
> has a clever solution, but I still don't yet.  Do you?

I am no clever person, but I often take great pleasure in reading up
works of smarter people. One of which is the Google's and Facebook's Mercurial
extension sets that they opensourced a while ago to support large repos.

The test suite for FB's 'sparse' extension[1] may address your concerns?

The 'sparse' extension defines the sparse checkout definition of a
working repository. It supports '--enable-profile' which take in definition
files ('.sparse'). These profiles are often checked into the root dir 
of the repo.

> 
> Thanks,
> Elijah

Regards,
Son Luong.

[1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115


  reply	other threads:[~2020-06-17 17:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17  7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc
2020-06-17 16:48 ` Elijah Newren
2020-06-17 17:58   ` Son Luong Ngoc [this message]
2020-06-17 22:36     ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-06-16 23:33 [PATCH 0/2] Sparse checkout status Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200617175850.GA57254@C02YX140LVDN.corpad.adbkng.com \
    --to=sluongng@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.