From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage
Date: Wed, 27 Jan 2021 17:57:21 -0500 [thread overview]
Message-ID: <YBHv0ZHZD4VMHLYR@nand.local> (raw)
In-Reply-To: <YBHmY7vNxu2hqOa/@coredump.intra.peff.net>
On Wed, Jan 27, 2021 at 05:17:07PM -0500, Jeff King wrote:
> It can sometimes be useful to see which refs are contributing to the
> overall repository size (e.g., does some branch have a bunch of objects
> not found elsewhere in history, which indicates that deleting it would
> shrink the size of a clone).
>
> You can find that out by generating a list of objects, getting their
> sizes from cat-file, and then summing them, like:
>
> git rev-list --objects main..branch
> cut -d' ' -f1 |
I suspect that this is from the original commit message that you wrote a
half-decade ago. Not that it really means much, but you could shave one
process off of this example by passing '--no-object-names' to 'git
rev-list'.
The whole point is that we can avoid having to do this, so I don't think
it really matters, anyway.
> [...]
> then we're faster to generate the list of objects, but we still spend a
> lot of time piping and looking things up. But if we do both together:
>
> [internal, bitmaps]
> $ time git rev-list --disk-usage --all --use-bitmap-index
> 1455691059
> real 0m0.235s
> user 0m0.186s
> sys 0m0.049s
>
> then we get the same answer much faster.
Very nice.
> This _could_ be made more flexible, but I didn't think it was worth the
> complexity. Some obvious things one might want are:
>
> - not counting up all reachable objects (i.e., requiring --objects for
> this output, and omitting it just counts up commits). This could be
> handled in the bitmap case with some extra code (OR-ing with the
> type bitmaps).
>
> But after 5 years of this patch, I've never wanted that once. The
> disk usage of just some of the objects isn't really that useful (and
> of course you can still get it by piping to cat-file).
Yeah. I think it's trivial to support it, but I'm in favor of a simpler
interface.
That said, I worry about painting ourselves into a corner if the default
implies --objects. If we wanted to change that, I'm pretty sure you'd
have to write a rule that says "imply objects, unless --tags, --blobs or
etc. are specified, and then only do that".
Maybe we'll never have to address that, but it's worth thinking about
before committing to implying '--objects'.
> - an option to output the sizes of specific objects along with their
> oids. But if you want to get to this level of flexibility, I think
> you're better off just using cat-file (and if we are concerned about
> the pipe costs, we should teach rev-list to understand cat-file's
> custom formats).
This I agree with completely. Any caller who wants that level of
flexibility shouldn't mind the piping.
I have no comments on the patch itself, which looks fine to me (and I
have seen over and over again as it seems to regularly cause conflicts
when merging new releases into GitHub's fork :-)).
Thanks,
Taylor
next prev parent reply other threads:[~2021-01-27 23:01 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King
2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King
2021-01-27 22:48 ` Taylor Blau
2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King
2021-01-27 22:57 ` Taylor Blau [this message]
2021-01-27 23:34 ` Jeff King
2021-01-27 23:01 ` Kyle Meyer
2021-01-27 23:36 ` Jeff King
2021-01-27 23:07 ` Eric Sunshine
2021-01-27 23:39 ` Jeff King
2021-01-27 22:46 ` [PATCH 0/2] rev-list --disk-usage Taylor Blau
2021-02-09 10:52 ` [PATCH v2] " Jeff King
2021-02-09 10:52 ` [PATCH v2 1/2] t: add --no-tag option to test_commit Jeff King
2021-02-09 10:53 ` [PATCH v2 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King
2021-02-09 11:09 ` [PATCH v2] rev-list --disk-usage Jeff King
2021-02-09 21:14 ` Junio C Hamano
2021-02-10 9:38 ` Jeff King
2021-02-10 0:44 ` Junio C Hamano
2021-02-10 1:49 ` Taylor Blau
2021-02-10 10:01 ` Jeff King
2021-02-10 16:31 ` Junio C Hamano
2021-02-10 20:38 ` Jeff King
2021-02-10 23:15 ` Taylor Blau
2021-02-11 11:00 ` Jeff King
2021-02-11 12:04 ` Ævar Arnfjörð Bjarmason
2021-02-11 17:57 ` Junio C Hamano
2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King
2021-02-17 23:34 ` [PATCH 1/2] docs/rev-list: add an examples section Jeff King
2021-02-17 23:35 ` [PATCH 2/2] docs/rev-list: add some examples of --disk-usage Jeff King
2021-02-17 23:44 ` [PATCH 0/2] rev-list --disk-usage example docs Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YBHv0ZHZD4VMHLYR@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).