git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Charles Bailey <charles@hashpling.org>
To: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Fast enumeration of objects
Date: Sun, 21 Jun 2015 20:20:30 +0100	[thread overview]
Message-ID: <1434914431-7745-1-git-send-email-charles@hashpling.org> (raw)
In-Reply-To: <1434705059-2793-1-git-send-email-charles@hashpling.org>

This is a re-casting of my previous filter-objects command but without
any of the filtering so it is now just "list-all-objects".

I have retained the "--verbose" option which outputs the same format as
the default "cat-file --batch-check" as it provides a useful performance
gain to filtering though "cat-file" if this basic information is all
that is needed.

The motivating use case is to enable a script to quickly scan a large
number of repositories for any large objects.

I performed some test timings of some different commands on a clone of
the Linux kernel which was completely packed.

	$ time git rev-list --all --objects |
		cut -d" " -f1 |
		git cat-file --batch-check |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m30.823s
	user    0m41.904s
	sys     0m7.728s

list-all-objects gives a significant improvement:

	$ time git list-all-objects |
		git cat-file --batch-check |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m9.585s
	user    0m10.820s
	sys     0m4.960s

skipping the cat-filter filter is a lesser but still significant
improvement:

	$ time git list-all-objects -v |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m5.637s
	user    0m6.652s
	sys     0m0.156s

The old filter-objects could do the size filter a little be faster, but
not by much:

	$ time git filter-objects --min-size=500k |
		wc -l
	958

	real    0m4.564s
	user    0m4.496s
	sys     0m0.064s

  parent reply	other threads:[~2015-06-21 19:21 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-19 18:28   ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-19 11:03   ` Remi Galan Alfonso
2015-06-19 11:06     ` Charles Bailey
2015-06-19 17:58   ` Junio C Hamano
2015-06-19 18:39     ` Junio C Hamano
2015-06-20 15:31       ` Jakub Narębski
2015-06-19 18:47     ` Jakub Narębski
2015-06-20 16:51     ` Charles Bailey
2015-06-20 17:47       ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
2015-06-19 10:10   ` Jeff King
2015-06-19 10:33     ` Charles Bailey
2015-06-19 10:52       ` Jeff King
2015-06-19 18:28         ` Junio C Hamano
2015-06-19 10:52       ` John Keeping
2015-06-19 11:04         ` Charles Bailey
2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-21 18:30     ` Charles Bailey
2015-06-22 22:03       ` Junio C Hamano
2015-06-22 22:08     ` Junio C Hamano
2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
2015-06-22 22:42     ` Charles Bailey
2015-06-21 19:20 ` Charles Bailey [this message]
2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22  8:38     ` Jeff King
2015-06-22 10:33       ` Jeff King
2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
2015-06-22 10:45         ` [PATCH 4/7] cat-file: add --buffer option Jeff King
2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
2015-06-26  6:56           ` Eric Sunshine
2015-06-26 15:48             ` Jeff King
2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
2015-06-22 22:03           ` Charles Bailey
2015-06-22 23:46             ` Jeff King
2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 21:50         ` Junio C Hamano
2015-06-22 23:50           ` Jeff King
2015-06-22 11:38       ` Charles Bailey
2015-06-22  9:57     ` Duy Nguyen
2015-06-22 10:24       ` Jeff King
2015-06-22  8:35   ` Fast enumeration of objects Jeff King
2015-06-22 19:44     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1434914431-7745-1-git-send-email-charles@hashpling.org \
    --to=charles@hashpling.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).