All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Campbell <ijc@hellion.org.uk>
To: gitster@pobox.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch
Date: Sun, 17 Sep 2017 10:43:23 +0100	[thread overview]
Message-ID: <1505641403.22447.6.camel@hellion.org.uk> (raw)
In-Reply-To: <20170917073657.31193-4-ijc@hellion.org.uk>

On Sun, 2017-09-17 at 08:36 +0100, Ian Campbell wrote:
> +if test -n "$state_branch"
> +then
> > +	echo "Saving rewrite state to $state_branch" 1>&2
> > +	state_blob=$(
> > +		perl -e'opendir D, "../map" or die;
> > +			open H, "|-", "git hash-object -w --stdin" or die;
> > +			foreach (sort readdir(D)) {
> > +				next if m/^\.\.?$/;
> > +				open F, "<../map/$_" or die;
> > +				chomp($f = <F>);
> > +				print H "$_:$f\n" or die;
> > +			}
> > +			close(H) or die;' || die "Unable to save state")

One things I've noticed is that for a full Linux tree history the
filter.map file is 50M+ which causes github to complain:

    remote: warning: File filter.map is 54.40 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB

(you can simulate this with `git log --pretty=format:"%H:%H"
upstream/master`.) I suppose that's not a bad recommendation for any
infra, not just GH's.

The blob is compressed in the object store so there isn't _much_ point
in compressing the map (also, it only goes down to ~30MB anyway so we
aren't buying all that much time), but I'm wondering if perhaps I
should look into a more intelligent representation, perhaps hashed by
the first two characters (as .git/objects is) to divide into several
blobs and have two levels.

I'm also wondering if the .git-rewrite/map directory, which will have
70k+ (and growing) directory entries for a modern Linux tree, would
benefit from the same sort of thing. OTOH in this case the extra shell
machinations to turn abcdef123 into ab/cdef123 might overwhelm the
savings in directory lookup time (unless there is a helper already for
that. That assume that directory lookup is even a bottleneck, I've not
measured but anecdotally/gut-feeling the commits-per-second does seem
to be decreasing over the course of the filtering process.

Ian.

      reply	other threads:[~2017-09-17  9:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-17  7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
2017-09-17  7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
2017-09-19  3:01   ` Junio C Hamano
2017-09-19  6:42     ` Ian Campbell
2017-09-17  7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell
2017-09-17  7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell
2017-09-17  7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell
2017-09-17  9:43   ` Ian Campbell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1505641403.22447.6.camel@hellion.org.uk \
    --to=ijc@hellion.org.uk \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.