git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage
Date: Wed, 27 Jan 2021 17:57:21 -0500	[thread overview]
Message-ID: <YBHv0ZHZD4VMHLYR@nand.local> (raw)
In-Reply-To: <YBHmY7vNxu2hqOa/@coredump.intra.peff.net>

On Wed, Jan 27, 2021 at 05:17:07PM -0500, Jeff King wrote:
> It can sometimes be useful to see which refs are contributing to the
> overall repository size (e.g., does some branch have a bunch of objects
> not found elsewhere in history, which indicates that deleting it would
> shrink the size of a clone).
>
> You can find that out by generating a list of objects, getting their
> sizes from cat-file, and then summing them, like:
>
>     git rev-list --objects main..branch
>     cut -d' ' -f1 |

I suspect that this is from the original commit message that you wrote a
half-decade ago. Not that it really means much, but you could shave one
process off of this example by passing '--no-object-names' to 'git
rev-list'.

The whole point is that we can avoid having to do this, so I don't think
it really matters, anyway.

> [...]
> then we're faster to generate the list of objects, but we still spend a
> lot of time piping and looking things up. But if we do both together:
>
>   [internal, bitmaps]
>   $ time git rev-list --disk-usage --all --use-bitmap-index
>   1455691059
>   real	0m0.235s
>   user	0m0.186s
>   sys	0m0.049s
>
> then we get the same answer much faster.

Very nice.

> This _could_ be made more flexible, but I didn't think it was worth the
> complexity. Some obvious things one might want are:
>
>   - not counting up all reachable objects (i.e., requiring --objects for
>     this output, and omitting it just counts up commits). This could be
>     handled in the bitmap case with some extra code (OR-ing with the
>     type bitmaps).
>
>     But after 5 years of this patch, I've never wanted that once. The
>     disk usage of just some of the objects isn't really that useful (and
>     of course you can still get it by piping to cat-file).

Yeah. I think it's trivial to support it, but I'm in favor of a simpler
interface.

That said, I worry about painting ourselves into a corner if the default
implies --objects. If we wanted to change that, I'm pretty sure you'd
have to write a rule that says "imply objects, unless --tags, --blobs or
etc. are specified, and then only do that".

Maybe we'll never have to address that, but it's worth thinking about
before committing to implying '--objects'.

>   - an option to output the sizes of specific objects along with their
>     oids. But if you want to get to this level of flexibility, I think
>     you're better off just using cat-file (and if we are concerned about
>     the pipe costs, we should teach rev-list to understand cat-file's
>     custom formats).

This I agree with completely. Any caller who wants that level of
flexibility shouldn't mind the piping.

I have no comments on the patch itself, which looks fine to me (and I
have seen over and over again as it seems to regularly cause conflicts
when merging new releases into GitHub's fork :-)).

Thanks,
Taylor

  reply	other threads:[~2021-01-27 23:01 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King
2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King
2021-01-27 22:48   ` Taylor Blau
2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King
2021-01-27 22:57   ` Taylor Blau [this message]
2021-01-27 23:34     ` Jeff King
2021-01-27 23:01   ` Kyle Meyer
2021-01-27 23:36     ` Jeff King
2021-01-27 23:07   ` Eric Sunshine
2021-01-27 23:39     ` Jeff King
2021-01-27 22:46 ` [PATCH 0/2] rev-list --disk-usage Taylor Blau
2021-02-09 10:52 ` [PATCH v2] " Jeff King
2021-02-09 10:52   ` [PATCH v2 1/2] t: add --no-tag option to test_commit Jeff King
2021-02-09 10:53   ` [PATCH v2 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King
2021-02-09 11:09   ` [PATCH v2] rev-list --disk-usage Jeff King
2021-02-09 21:14     ` Junio C Hamano
2021-02-10  9:38       ` Jeff King
2021-02-10  0:44   ` Junio C Hamano
2021-02-10  1:49     ` Taylor Blau
2021-02-10 10:01     ` Jeff King
2021-02-10 16:31       ` Junio C Hamano
2021-02-10 20:38         ` Jeff King
2021-02-10 23:15           ` Taylor Blau
2021-02-11 11:00             ` Jeff King
2021-02-11 12:04               ` Ævar Arnfjörð Bjarmason
2021-02-11 17:57                 ` Junio C Hamano
2021-02-17 23:31                 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King
2021-02-17 23:34                   ` [PATCH 1/2] docs/rev-list: add an examples section Jeff King
2021-02-17 23:35                   ` [PATCH 2/2] docs/rev-list: add some examples of --disk-usage Jeff King
2021-02-17 23:44                   ` [PATCH 0/2] rev-list --disk-usage example docs Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YBHv0ZHZD4VMHLYR@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).