All of lore.kernel.org
 help / color / mirror / Atom feed
* Improvements to parse-options and a new filter-objects command
@ 2015-06-19  9:10 Charles Bailey
  2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
                   ` (4 more replies)
  0 siblings, 5 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-19  9:10 UTC (permalink / raw)
  To: Junio Hamano, git

In my team we've been looking for a fast way to check a large number of
repositories for large files, which are typically unintentionally checked in
binaries, so that we can warn repository owners and help them tidy up as
desired.

There seem to be two main approaches to scripting this. The first is to do
something revision-walk based such as `log --numstat` and the second is to scan
pack files using `verify-pack -v` and either to ensure that everything is packed
or scan loose objects separately.

The revision walking tends to be slow and parsing verify-pack -v is awkward
not only because of the need to take account of multiple packs and loose
objects, but also because it is porcelainish. For example, at some point it
gained a delta chain summary which needs to be snipped before the list of
packed objects can be sorted and used.

The third patch in this series adds a new built in which makes this simple and
fast. While implementing it, I found a couple of other improvements which I
think stand alone.

[PATCH 1/3] Correct test-parse-options to handle negative ints

I noticed that a printf in test-parse-options was using %u instead of %d for an
int with the consequence that it wouldn't ever print a negative value correctly.
I don't know that we do ever parse a negative integer as an option, but there's
no reason that it shouldn't work so I fixed it and added a trivial test.

[PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

I wanted to be able to parse options like --min-size=500k in my new command so I
started to add OPT_ULONG, only to realise that it already existed but was
private to pack-objects. I added OPT_ULONG support to parse-options based on the
existing OPT_INTEGER code, added new tests and changed pack-objects to use this
instead.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/3] Correct test-parse-options to handle negative ints
  2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
@ 2015-06-19  9:10 ` Charles Bailey
  2015-06-19 18:28   ` Junio C Hamano
  2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Charles Bailey @ 2015-06-19  9:10 UTC (permalink / raw)
  To: Junio Hamano, git

From: Charles Bailey <cbailey32@bloomberg.net>

Fix the printf specification to treat 'integer' as the signed type that
it is and add a test that checks that we parse negative option
arguments.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
---
 t/t0040-parse-options.sh | 2 ++
 test-parse-options.c     | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index b044785..ecb7417 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -151,6 +151,8 @@ test_expect_success 'short options' '
 	test_must_be_empty output.err
 '
 
+test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
+
 cat > expect << EOF
 boolean: 2
 integer: 1729
diff --git a/test-parse-options.c b/test-parse-options.c
index 5dabce6..7c492cf 100644
--- a/test-parse-options.c
+++ b/test-parse-options.c
@@ -82,7 +82,7 @@ int main(int argc, char **argv)
 	argc = parse_options(argc, (const char **)argv, prefix, options, usage, 0);
 
 	printf("boolean: %d\n", boolean);
-	printf("integer: %u\n", integer);
+	printf("integer: %d\n", integer);
 	printf("timestamp: %lu\n", timestamp);
 	printf("string: %s\n", string ? string : "(not set)");
 	printf("abbrev: %d\n", abbrev);
-- 
2.4.0.53.g8440f74

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
  2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
@ 2015-06-19  9:10 ` Charles Bailey
  2015-06-19 11:03   ` Remi Galan Alfonso
  2015-06-19 17:58   ` Junio C Hamano
  2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-19  9:10 UTC (permalink / raw)
  To: Junio Hamano, git

From: Charles Bailey <cbailey32@bloomberg.net>

The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing)
is more widely applicable. Add support for OPT_ULONG to parse-options.h
and change pack-objects.c use this support.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
---
 builtin/pack-objects.c   | 17 -----------------
 parse-options.c          | 15 +++++++++++++++
 parse-options.h          |  5 ++++-
 t/t0040-parse-options.sh | 46 ++++++++++++++++++++++++++++++++++++++++++----
 test-parse-options.c     |  3 +++
 5 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80fe8c7..5de76db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2588,23 +2588,6 @@ static int option_parse_unpack_unreachable(const struct option *opt,
 	return 0;
 }
 
-static int option_parse_ulong(const struct option *opt,
-			      const char *arg, int unset)
-{
-	if (unset)
-		die(_("option %s does not accept negative form"),
-		    opt->long_name);
-
-	if (!git_parse_ulong(arg, opt->value))
-		die(_("unable to parse value '%s' for option %s"),
-		    arg, opt->long_name);
-	return 0;
-}
-
-#define OPT_ULONG(s, l, v, h) \
-	{ OPTION_CALLBACK, (s), (l), (v), "n", (h),	\
-	  PARSE_OPT_NONEG, option_parse_ulong }
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
diff --git a/parse-options.c b/parse-options.c
index 80106c0..76a5c3e 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -180,6 +180,21 @@ static int get_value(struct parse_opt_ctx_t *p,
 			return opterror(opt, "expects a numerical value", flags);
 		return 0;
 
+	case OPTION_ULONG:
+		if (unset) {
+			*(unsigned long *)opt->value = 0;
+			return 0;
+		}
+		if (opt->flags & PARSE_OPT_OPTARG && !p->opt) {
+			*(unsigned long *)opt->value = opt->defval;
+			return 0;
+		}
+		if (get_arg(p, opt, flags, &arg))
+			return -1;
+		if (!git_parse_ulong(arg, opt->value))
+			return opterror(opt, "expects a numerical value", flags);
+		return 0;
+
 	default:
 		die("should not happen, someone must be hit on the forehead");
 	}
diff --git a/parse-options.h b/parse-options.h
index c71e9da..2ddb26f 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -18,7 +18,8 @@ enum parse_opt_type {
 	OPTION_INTEGER,
 	OPTION_CALLBACK,
 	OPTION_LOWLEVEL_CALLBACK,
-	OPTION_FILENAME
+	OPTION_FILENAME,
+	OPTION_ULONG
 };
 
 enum parse_opt_flags {
@@ -129,6 +130,8 @@ struct option {
 #define OPT_CMDMODE(s, l, v, h, i) { OPTION_CMDMODE, (s), (l), (v), NULL, \
 				      (h), PARSE_OPT_NOARG|PARSE_OPT_NONEG, NULL, (i) }
 #define OPT_INTEGER(s, l, v, h)     { OPTION_INTEGER, (s), (l), (v), N_("n"), (h) }
+#define OPT_ULONG(s, l, v, h)       { OPTION_ULONG, (s), (l), (v), N_("n"), \
+				      (h), PARSE_OPT_NONEG }
 #define OPT_STRING(s, l, v, a, h)   { OPTION_STRING,  (s), (l), (v), (a), (h) }
 #define OPT_STRING_LIST(s, l, v, a, h) \
 				    { OPTION_CALLBACK, (s), (l), (v), (a), \
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index ecb7417..55b3dba 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -19,6 +19,8 @@ usage: test-parse-options <options>
 
     -i, --integer <n>     get a integer
     -j <n>                get a integer, too
+    -u, --unsigned-long <n>
+                          get an unsigned long
     --set23               set integer to 23
     -t <time>             get timestamp of <time>
     -L, --length <str>    get length of <str>
@@ -58,6 +60,7 @@ mv expect expect.err
 cat >expect.template <<EOF
 boolean: 0
 integer: 0
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -132,9 +135,32 @@ test_expect_success 'OPT_BOOL() no negation #2' 'check_unknown_i18n --no-no-fear
 
 test_expect_success 'OPT_BOOL() positivation' 'check boolean: 0 -D --doubt'
 
+test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
+
+test_expect_success 'OPT_ULONG() simple' '
+	check "unsigned long:" 2345678 -u 2345678
+'
+
+test_expect_success 'OPT_ULONG() kilo' '
+	check "unsigned long:" 239616 -u 234k
+'
+
+test_expect_success 'OPT_ULONG() mega' '
+	check "unsigned long:" 104857600 -u 100m
+'
+
+test_expect_success 'OPT_ULONG() giga' '
+	check "unsigned long:" 1073741824 -u 1g
+'
+
+test_expect_success 'OPT_ULONG() 3giga' '
+	check "unsigned long:" 3221225472 -u 3g
+'
+
 cat > expect << EOF
 boolean: 2
 integer: 1729
+unsigned long: 16384
 timestamp: 0
 string: 123
 abbrev: 7
@@ -145,7 +171,7 @@ file: prefix/my.file
 EOF
 
 test_expect_success 'short options' '
-	test-parse-options -s123 -b -i 1729 -b -vv -n -F my.file \
+	test-parse-options -s123 -b -i 1729 -u 16k -b -vv -n -F my.file \
 	> output 2> output.err &&
 	test_cmp expect output &&
 	test_must_be_empty output.err
@@ -156,6 +182,7 @@ test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
 cat > expect << EOF
 boolean: 2
 integer: 1729
+unsigned long: 16384
 timestamp: 0
 string: 321
 abbrev: 10
@@ -166,9 +193,10 @@ file: prefix/fi.le
 EOF
 
 test_expect_success 'long options' '
-	test-parse-options --boolean --integer 1729 --boolean --string2=321 \
-		--verbose --verbose --no-dry-run --abbrev=10 --file fi.le\
-		--obsolete > output 2> output.err &&
+	test-parse-options --boolean --integer 1729 --unsigned-long 16k \
+		--boolean --string2=321 --verbose --verbose --no-dry-run \
+		--abbrev=10 --file fi.le --obsolete \
+		> output 2> output.err &&
 	test_must_be_empty output.err &&
 	test_cmp expect output
 '
@@ -182,6 +210,7 @@ test_expect_success 'missing required value' '
 cat > expect << EOF
 boolean: 1
 integer: 13
+unsigned long: 0
 timestamp: 0
 string: 123
 abbrev: 7
@@ -204,6 +233,7 @@ test_expect_success 'intermingled arguments' '
 cat > expect << EOF
 boolean: 0
 integer: 2
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -232,6 +262,7 @@ test_expect_success 'ambiguously abbreviated option' '
 cat > expect << EOF
 boolean: 0
 integer: 0
+unsigned long: 0
 timestamp: 0
 string: 123
 abbrev: 7
@@ -270,6 +301,7 @@ test_expect_success 'detect possible typos' '
 cat > expect <<EOF
 boolean: 0
 integer: 0
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -289,6 +321,7 @@ test_expect_success 'keep some options as arguments' '
 cat > expect <<EOF
 boolean: 0
 integer: 0
+unsigned long: 0
 timestamp: 1
 string: (not set)
 abbrev: 7
@@ -310,6 +343,7 @@ cat > expect <<EOF
 Callback: "four", 0
 boolean: 5
 integer: 4
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -338,6 +372,7 @@ test_expect_success 'OPT_CALLBACK() and callback errors work' '
 cat > expect <<EOF
 boolean: 1
 integer: 23
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -362,6 +397,7 @@ test_expect_success 'OPT_NEGBIT() and OPT_SET_INT() work' '
 cat > expect <<EOF
 boolean: 6
 integer: 0
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -392,6 +428,7 @@ test_expect_success 'OPT_COUNTUP() with PARSE_OPT_NODASH works' '
 cat > expect <<EOF
 boolean: 0
 integer: 12345
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -410,6 +447,7 @@ test_expect_success 'OPT_NUMBER_CALLBACK() works' '
 cat >expect <<EOF
 boolean: 0
 integer: 0
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
diff --git a/test-parse-options.c b/test-parse-options.c
index 7c492cf..e592d7e 100644
--- a/test-parse-options.c
+++ b/test-parse-options.c
@@ -4,6 +4,7 @@
 
 static int boolean = 0;
 static int integer = 0;
+static unsigned long unsigned_long = 0;
 static unsigned long timestamp;
 static int abbrev = 7;
 static int verbose = 0, dry_run = 0, quiet = 0;
@@ -48,6 +49,7 @@ int main(int argc, char **argv)
 		OPT_GROUP(""),
 		OPT_INTEGER('i', "integer", &integer, "get a integer"),
 		OPT_INTEGER('j', NULL, &integer, "get a integer, too"),
+		OPT_ULONG('u', "unsigned-long", &unsigned_long, "get an unsigned long"),
 		OPT_SET_INT(0, "set23", &integer, "set integer to 23", 23),
 		OPT_DATE('t', NULL, &timestamp, "get timestamp of <time>"),
 		OPT_CALLBACK('L', "length", &integer, "str",
@@ -83,6 +85,7 @@ int main(int argc, char **argv)
 
 	printf("boolean: %d\n", boolean);
 	printf("integer: %d\n", integer);
+	printf("unsigned long: %lu\n", unsigned_long);
 	printf("timestamp: %lu\n", timestamp);
 	printf("string: %s\n", string ? string : "(not set)");
 	printf("abbrev: %d\n", abbrev);
-- 
2.4.0.53.g8440f74

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/3] Add filter-objects command
  2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
  2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
  2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
@ 2015-06-19  9:10 ` Charles Bailey
  2015-06-19 10:10   ` Jeff King
  2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
  2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
  4 siblings, 1 reply; 51+ messages in thread
From: Charles Bailey @ 2015-06-19  9:10 UTC (permalink / raw)
  To: Junio Hamano, git

From: Charles Bailey <cbailey32@bloomberg.net>

filter-objects is a command to scan all objects in the object database
for the repository and print the ids of those which match the given
criteria.

The current supported criteria are object type and the minimum size of
the object.

The guiding use case is to scan repositories quickly for large objects
which may cause performance issues for users. The list of objects can
then be used to guide some future remediating action.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
---
 Documentation/git-filter-objects.txt | 38 +++++++++++++++++++
 Makefile                             |  1 +
 builtin.h                            |  1 +
 builtin/filter-objects.c             | 73 ++++++++++++++++++++++++++++++++++++
 git.c                                |  1 +
 t/t8100-filter-objects.sh            | 67 +++++++++++++++++++++++++++++++++
 6 files changed, 181 insertions(+)
 create mode 100644 Documentation/git-filter-objects.txt
 create mode 100644 builtin/filter-objects.c
 create mode 100755 t/t8100-filter-objects.sh

diff --git a/Documentation/git-filter-objects.txt b/Documentation/git-filter-objects.txt
new file mode 100644
index 0000000..c10ca01
--- /dev/null
+++ b/Documentation/git-filter-objects.txt
@@ -0,0 +1,38 @@
+git-filter-objects(1)
+=====================
+
+NAME
+----
+git-filter-objects - Scan through all objects in the repository and print those
+matching a given filter
+
+
+SYNOPSIS
+--------
+[verse]
+'git filter-objects' [-t <type> | --type=<type>] [--min-size=<size>]
+	[-v|--verbose]
+
+DESCRIPTION
+-----------
+Scans all objects in a repository - including any unreachable objects - and
+print out the ids of all matching objects.  If `--verbose` is specified then
+the object type and size is printed out as well as its id.
+
+OPTIONS
+-------
+-t::
+--type::
+	Only list objects whose type matches <type>.
+
+--min-size::
+	Only list objects whose size exceeds <size> bytes.
+
+-v::
+--verbose::
+	Output in the followin format instead of just printing object ids:
+	<sha1> SP <type> SP <size>
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 149f1c7..a7c017f 100644
--- a/Makefile
+++ b/Makefile
@@ -842,6 +842,7 @@ BUILTIN_OBJS += builtin/diff.o
 BUILTIN_OBJS += builtin/fast-export.o
 BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
+BUILTIN_OBJS += builtin/filter-objects.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
 BUILTIN_OBJS += builtin/fsck.o
diff --git a/builtin.h b/builtin.h
index b87df70..5a15693 100644
--- a/builtin.h
+++ b/builtin.h
@@ -62,6 +62,7 @@ extern int cmd_diff_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_fast_export(int argc, const char **argv, const char *prefix);
 extern int cmd_fetch(int argc, const char **argv, const char *prefix);
 extern int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
+extern int cmd_filter_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 extern int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_format_patch(int argc, const char **argv, const char *prefix);
diff --git a/builtin/filter-objects.c b/builtin/filter-objects.c
new file mode 100644
index 0000000..c40d621
--- /dev/null
+++ b/builtin/filter-objects.c
@@ -0,0 +1,73 @@
+#include "cache.h"
+#include "builtin.h"
+#include "revision.h"
+#include "parse-options.h"
+
+#include <stdio.h>
+
+static int req_type;
+static unsigned long min_size;
+static int verbose;
+
+static int check_object(const unsigned char *sha1)
+{
+	unsigned long size;
+	int type = sha1_object_info(sha1, &size);
+
+	if (type < 0)
+		return -1;
+
+	if (size >= min_size && (!req_type || type == req_type)) {
+		if (verbose)
+			printf("%s %s %lu\n", sha1_to_hex(sha1), typename(type), size);
+		else
+			printf("%s\n", sha1_to_hex(sha1));
+	}
+
+	return 0;
+}
+
+static int check_loose_object(const unsigned char *sha1,
+			      const char *path,
+			      void *data)
+{
+	return check_object(sha1);
+}
+
+static int check_packed_object(const unsigned char *sha1,
+			       struct packed_git *pack,
+			       uint32_t pos,
+			       void *data)
+{
+	return check_object(sha1);
+}
+
+static char *opt_type;
+static struct option builtin_filter_objects_options[] = {
+	OPT_ULONG(0, "min-size", &min_size, "minimum size of object to show"),
+	OPT_STRING('t', "type", &opt_type, NULL, "type of objects to show"),
+	OPT__VERBOSE(&verbose, "show object type and size"),
+	OPT_END()
+};
+
+int cmd_filter_objects(int argc, const char **argv, const char *prefix)
+{
+	struct packed_git *p;
+
+	argc = parse_options(argc, argv, prefix, builtin_filter_objects_options,
+			     NULL, 0);
+
+	if (opt_type)
+		req_type = type_from_string(opt_type);
+
+	for_each_loose_object(check_loose_object, NULL, 0);
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next) {
+		open_pack_index(p);
+	}
+
+	for_each_packed_object(check_packed_object, NULL, 0);
+
+	return 0;
+}
diff --git a/git.c b/git.c
index 44374b1..4c87afd 100644
--- a/git.c
+++ b/git.c
@@ -403,6 +403,7 @@ static struct cmd_struct commands[] = {
 	{ "fast-export", cmd_fast_export, RUN_SETUP },
 	{ "fetch", cmd_fetch, RUN_SETUP },
 	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP },
+	{ "filter-objects", cmd_filter_objects, RUN_SETUP },
 	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
 	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
 	{ "format-patch", cmd_format_patch, RUN_SETUP },
diff --git a/t/t8100-filter-objects.sh b/t/t8100-filter-objects.sh
new file mode 100755
index 0000000..4b0137b
--- /dev/null
+++ b/t/t8100-filter-objects.sh
@@ -0,0 +1,67 @@
+#!/bin/sh
+
+test_description='git filter-objects'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo hello, world >file &&
+	git add file &&
+	git commit -m "initial"
+'
+
+test_expect_success 'filter by type' '
+	git rev-parse HEAD >expected &&
+	git filter-objects -t commit >result &&
+	test_cmp expected result &&
+	git rev-parse HEAD:file >expected &&
+	git filter-objects -t blob >result &&
+	test_cmp expected result &&
+	git rev-parse HEAD^{tree} >expected &&
+	git filter-objects -t tree >result &&
+	test_cmp expected result
+'
+
+test_expect_success 'filter by type after pack' '
+	git repack -Ad &&
+	git rev-parse HEAD >expected &&
+	git filter-objects -t commit >result &&
+	test_cmp expected result &&
+	git rev-parse HEAD:file >expected &&
+	git filter-objects -t blob >result &&
+	test_cmp expected result &&
+	git rev-parse HEAD^{tree} >expected &&
+	git filter-objects -t tree >result &&
+	test_cmp expected result
+'
+
+test_expect_success 'verbose output' '
+	echo $(git rev-parse HEAD) commit $(git cat-file -s HEAD) >expected &&
+	git filter-objects -v -t commit >result &&
+	test_cmp expected result &&
+	echo $(git rev-parse HEAD:file) blob $(git cat-file -s HEAD:file) >expected &&
+	git filter-objects -v -t blob >result &&
+	test_cmp expected result &&
+	echo $(git rev-parse HEAD^{tree}) tree $(git cat-file -s HEAD^{tree}) >expected &&
+	git filter-objects -v -t tree >result &&
+	test_cmp expected result
+'
+
+test_expect_success 'filter on size' '
+	git commit -F - --allow-empty <<-\EOF &&
+		This is a reasonably long commit message
+
+		It is designed to make sure that we create an object
+		that is substantially larger than all the others.
+
+		Our test file blob is a few bytes, our tree is similarly
+		small and our first commit is not too big.
+
+		This message alone is about 300 characters and a sample
+		commit from it has been measured at 562 bytes.
+	EOF
+	git rev-parse HEAD >expected &&
+	git filter-objects --min-size=500 >result &&
+	test_cmp expected result
+'
+
+test_done
-- 
2.4.0.53.g8440f74

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/3] Add filter-objects command
  2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
@ 2015-06-19 10:10   ` Jeff King
  2015-06-19 10:33     ` Charles Bailey
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2015-06-19 10:10 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio Hamano, git

On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:

> filter-objects is a command to scan all objects in the object database
> for the repository and print the ids of those which match the given
> criteria.
> 
> The current supported criteria are object type and the minimum size of
> the object.
> 
> The guiding use case is to scan repositories quickly for large objects
> which may cause performance issues for users. The list of objects can
> then be used to guide some future remediating action.

I've had to perform this exact same task. You can already do the
"filtering" part pretty easily and efficiently with cat-file and a perl
script, like:

  magically_generate_all_objects |
  git cat-file --batch-check='%(objectsize) %(objectname)' |
  perl -alne 'print $F[1] if $F[0] > 1234'

That's not as friendly as your filter-objects, but it's a lot more
flexible (since you can ask cat-file for all sorts of information).

Obviously I've glossed over the "how to get a list of objects" part.
If you truly want all objects (not just reachable ones), or if "rev-list
--objects" is too slow, the best way is:

  objects() {
    # loose objects
    for i in objects/??/*; do
       echo $i
    done |
    sed 's,objects/\(..\)/,\1,'

    # packed objects
    for i in objects/pack/*.idx; do
      git show-index <$i
    done |
    cut -d' ' -f2
  }

Certainly I'm not opposed to doing something less horrible there (and I
am happy to see my for_each_*_object interface getting more callers!).
I kind of wonder if we should make "all objects, reachable or not" an
option for rev-list. I'm not sure if it would choke on adding them all
to the "pending" list, though; it's not really made for that. But it
would enable neat things like:

  git rev-list --all-the-objects --not --all

to show you what's unreachable.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/3] Add filter-objects command
  2015-06-19 10:10   ` Jeff King
@ 2015-06-19 10:33     ` Charles Bailey
  2015-06-19 10:52       ` Jeff King
  2015-06-19 10:52       ` John Keeping
  0 siblings, 2 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-19 10:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio Hamano, git

On Fri, Jun 19, 2015 at 06:10:10AM -0400, Jeff King wrote:
> On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:
> 
> > filter-objects is a command to scan all objects in the object database
> > for the repository and print the ids of those which match the given
> > criteria.
> > 
> > The current supported criteria are object type and the minimum size of
> > the object.
> > 
> > The guiding use case is to scan repositories quickly for large objects
> > which may cause performance issues for users. The list of objects can
> > then be used to guide some future remediating action.
> 
> I've had to perform this exact same task. You can already do the
> "filtering" part pretty easily and efficiently with cat-file and a perl
> script, like:
> 
>   magically_generate_all_objects |
>   git cat-file --batch-check='%(objectsize) %(objectname)' |
>   perl -alne 'print $F[1] if $F[0] > 1234'
> 
> That's not as friendly as your filter-objects, but it's a lot more
> flexible (since you can ask cat-file for all sorts of information).
> 
> Obviously I've glossed over the "how to get a list of objects" part.
> If you truly want all objects (not just reachable ones), or if "rev-list
> --objects" is too slow [...]

So, yes, performance is definitely an issue and I could have called this
command "git magically-generate-all-object-for-scripts" but then, as it
was so easy to provide exactly the filtering that I was looking for in
the C code, I thought I would do that as well and then "filter-objects"
("filter-all-objects"?) seemed like a better name.

It's about an order of magnitude faster on the systems I've checked to
do a parameterless filter-objects then rev-list --all --objects,
although I understand they do different things.

I am also thinking about another piece that answers the question: "which
commits introduce any of (or the first of) this list of objects?". This
can be done by parseing a diff --raw for commits but I think it should
be possible to do this faster, too.

Charles.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/3] Add filter-objects command
  2015-06-19 10:33     ` Charles Bailey
@ 2015-06-19 10:52       ` Jeff King
  2015-06-19 18:28         ` Junio C Hamano
  2015-06-19 10:52       ` John Keeping
  1 sibling, 1 reply; 51+ messages in thread
From: Jeff King @ 2015-06-19 10:52 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio Hamano, git

On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:

> > Obviously I've glossed over the "how to get a list of objects" part.
> > If you truly want all objects (not just reachable ones), or if "rev-list
> > --objects" is too slow [...]
> 
> So, yes, performance is definitely an issue and I could have called this
> command "git magically-generate-all-object-for-scripts" but then, as it
> was so easy to provide exactly the filtering that I was looking for in
> the C code, I thought I would do that as well and then "filter-objects"
> ("filter-all-objects"?) seemed like a better name.

Right, my point was only that it works for _your_ particular filter, but
it would be nice to have something more general. And we already have
"cat-file --batch-check". IOW, I think I would prefer the "magical" form
because it's a better scripting building block. As you note,
"filter-objects" without any filters is exactly that. Your 10 extra
lines of C code are not exactly bloat, but I just wonder if other people
will find it all that useful.

> It's about an order of magnitude faster on the systems I've checked to
> do a parameterless filter-objects then rev-list --all --objects,
> although I understand they do different things.

Right, it's the object-opening and hash lookups that kill you in
"rev-list", because it's actually walking the graph.

> I am also thinking about another piece that answers the question: "which
> commits introduce any of (or the first of) this list of objects?". This
> can be done by parseing a diff --raw for commits but I think it should
> be possible to do this faster, too.

If you care about "introduce", I think you have to traverse and do the
diffs. If you only care about "contains" (for example, because you want
to know which path the blob is found at), you can find trees which
mention it, then trees which mention that tree, and so on. I think that
ends up slower in practice, though.

I have patches that implement a "rev-list --find=$sha1", which sets a
bit on $sha1 and then traverses with --objects until we find it (or
them; you can specify multiple). It's pretty straightforward, but it
does cost as much as "git rev-list --objects" in the worst case. Let me
know if you're interested and I can clean it up and post it.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/3] Add filter-objects command
  2015-06-19 10:33     ` Charles Bailey
  2015-06-19 10:52       ` Jeff King
@ 2015-06-19 10:52       ` John Keeping
  2015-06-19 11:04         ` Charles Bailey
  1 sibling, 1 reply; 51+ messages in thread
From: John Keeping @ 2015-06-19 10:52 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Jeff King, Junio Hamano, git

On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:
> So, yes, performance is definitely an issue and I could have called this
> command "git magically-generate-all-object-for-scripts" but then, as it
> was so easy to provide exactly the filtering that I was looking for in
> the C code, I thought I would do that as well and then "filter-objects"
> ("filter-all-objects"?) seemed like a better name.

By analogy with "git filter-branch", I don't think "filter-objects" is a
good name here.  My preference would be "ls-objects".

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
@ 2015-06-19 11:03   ` Remi Galan Alfonso
  2015-06-19 11:06     ` Charles Bailey
  2015-06-19 17:58   ` Junio C Hamano
  1 sibling, 1 reply; 51+ messages in thread
From: Remi Galan Alfonso @ 2015-06-19 11:03 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio Hamano, git

Charles Bailey <cbailey32@bloomberg.net> writes:
> test_expect_success 'long options' '
> - test-parse-options --boolean --integer 1729 --boolean --string2=321 \
> - --verbose --verbose --no-dry-run --abbrev=10 --file fi.le\
> - --obsolete > output 2> output.err &&
> + test-parse-options --boolean --integer 1729 --unsigned-long 16k \
> + --boolean --string2=321 --verbose --verbose --no-dry-run \
> + --abbrev=10 --file fi.le --obsolete \
> + > output 2> output.err &&
> test_must_be_empty output.err &&
> test_cmp expect output
> '

It's trivial matter but the line:
> + > output 2> output.err &&
should be written:
> + >output 2>output.err &&

It was incorrectly written before but since 
you are modifying the line, it might be a 
good thing to change it now.

Rémi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/3] Add filter-objects command
  2015-06-19 10:52       ` John Keeping
@ 2015-06-19 11:04         ` Charles Bailey
  0 siblings, 0 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-19 11:04 UTC (permalink / raw)
  To: John Keeping; +Cc: Jeff King, Junio Hamano, git

On Fri, Jun 19, 2015 at 11:52:28AM +0100, John Keeping wrote:
> On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:
> > So, yes, performance is definitely an issue and I could have called this
> > command "git magically-generate-all-object-for-scripts" but then, as it
> > was so easy to provide exactly the filtering that I was looking for in
> > the C code, I thought I would do that as well and then "filter-objects"
> > ("filter-all-objects"?) seemed like a better name.
> 
> By analogy with "git filter-branch", I don't think "filter-objects" is a
> good name here.  My preference would be "ls-objects".

I like that because it emphasises why I wrote it, the very basic
filtering is a nice additional feature.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19 11:03   ` Remi Galan Alfonso
@ 2015-06-19 11:06     ` Charles Bailey
  0 siblings, 0 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-19 11:06 UTC (permalink / raw)
  To: Remi Galan Alfonso; +Cc: Junio Hamano, git

On Fri, Jun 19, 2015 at 01:03:25PM +0200, Remi Galan Alfonso wrote:
> 
> It's trivial matter but the line:
> > + > output 2> output.err &&
> should be written:
> > + >output 2>output.err &&
> 
> It was incorrectly written before but since 
> you are modifying the line, it might be a 
> good thing to change it now.

Yes, I can fold this in. I just changed the wrapping and didn't spot
this style error.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
  2015-06-19 11:03   ` Remi Galan Alfonso
@ 2015-06-19 17:58   ` Junio C Hamano
  2015-06-19 18:39     ` Junio C Hamano
                       ` (2 more replies)
  1 sibling, 3 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-19 17:58 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> diff --git a/parse-options.h b/parse-options.h
> index c71e9da..2ddb26f 100644
> --- a/parse-options.h
> +++ b/parse-options.h
> @@ -18,7 +18,8 @@ enum parse_opt_type {
>  	OPTION_INTEGER,
>  	OPTION_CALLBACK,
>  	OPTION_LOWLEVEL_CALLBACK,
> -	OPTION_FILENAME
> +	OPTION_FILENAME,
> +	OPTION_ULONG
>  };

Please place it immediately after INTEGER, as they are conceptually
siblings---group similar things together.

>  			return opterror(opt, "expects a numerical value", flags);
>  		return 0;
>  
> +	case OPTION_ULONG:

This one is placed right from that point of view ;-)

> +		if (unset) {
> +			*(unsigned long *)opt->value = 0;
> +			return 0;
> +		}
> +		if (opt->flags & PARSE_OPT_OPTARG && !p->opt) {
> +			*(unsigned long *)opt->value = opt->defval;
> +			return 0;
> +		}
> +		if (get_arg(p, opt, flags, &arg))
> +			return -1;
> +		if (!git_parse_ulong(arg, opt->value))
> +			return opterror(opt, "expects a numerical value", flags);

This used to be:

> -		die(_("unable to parse value '%s' for option %s"),
> -		    arg, opt->long_name);

but opterror() talks about which option, so there is no information
loss by losing "for option %s" from here.  That means there is only
one difference for pack-objects:

    $ git pack-objects --max-pack-size=1T
    fatal: unable to parse value '1T' for option max-pack-size
    $ ./git pack-objects --max-pack-size=1T
    error: option `max-pack-size' expects a numerical value
    usage: git pack-objects --stdout [options...
    ... 30 more lines omitted ...

Eh, make that two:

 * We no longer say what value we did not like.  The user presumably
   knows what he typed, so this is only a minor loss.

 * We used to stop without giving "usage", as the error message was
   specific enough.  We now spew descriptions on other options
   unrelated to the specific error the user may want to concentrate
   on.  Perhaps this is a minor regression.

I wonder if "expects a numerical value" is the best way to say this.

Ponder:

 - we do not take "4.8"
 - we do not take "-4".
 - people may not realize, from "numerical", that we take "5M".

Except for the minor nits above, I think this is a good change.

This is a totally unrelated tangent that does not have to be part of
your series, but we probably should take "4.8M"; I do not think we
currently do.

Oh, and perhaps 1T, too.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/3] Add filter-objects command
  2015-06-19 10:52       ` Jeff King
@ 2015-06-19 18:28         ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-19 18:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> Right, my point was only that it works for _your_ particular
> filter, but it would be nice to have something more general. And
> we already have "cat-file --batch-check". IOW, I think I would
> prefer the "magical" form because it's a better scripting building
> block. As you note, "filter-objects" without any filters is
> exactly that. Your 10 extra lines of C code are not exactly bloat,
> but I just wonder if other people will find it all that useful.

Yup.  I do not mind a fast "enumerate all objects" but I suspect
that making "all" fast may turn out to be not so great a trade-off
after all, as you would need more work on the "now we have all
coming from our input, let's filter with this and that criteria"
downstream in general cases.  Graph-based filtering e.g. "Oops, here
is our whole customer database committed by mistake--which branch
should we rewrite to nuke?" inherently is much more costly to do in
the downstream that essentially has to reconstruct the graph around
interesting parts of the history, and is better done by "enumerate"
phase spending time to do the actual graph traversal.

And "filter-anything" should not be the name for "enumerate" command
that comes on the upstream of a pipe.  You usually call what is
downstream of a pipe "a filter".

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/3] Correct test-parse-options to handle negative ints
  2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
@ 2015-06-19 18:28   ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-19 18:28 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> From: Charles Bailey <cbailey32@bloomberg.net>
>
> Fix the printf specification to treat 'integer' as the signed type that
> it is and add a test that checks that we parse negative option
> arguments.
>
> Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
> ---

Makes sense.  Will queue.

>  t/t0040-parse-options.sh | 2 ++
>  test-parse-options.c     | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
> index b044785..ecb7417 100755
> --- a/t/t0040-parse-options.sh
> +++ b/t/t0040-parse-options.sh
> @@ -151,6 +151,8 @@ test_expect_success 'short options' '
>  	test_must_be_empty output.err
>  '
>  
> +test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
> +
>  cat > expect << EOF
>  boolean: 2
>  integer: 1729
> diff --git a/test-parse-options.c b/test-parse-options.c
> index 5dabce6..7c492cf 100644
> --- a/test-parse-options.c
> +++ b/test-parse-options.c
> @@ -82,7 +82,7 @@ int main(int argc, char **argv)
>  	argc = parse_options(argc, (const char **)argv, prefix, options, usage, 0);
>  
>  	printf("boolean: %d\n", boolean);
> -	printf("integer: %u\n", integer);
> +	printf("integer: %d\n", integer);
>  	printf("timestamp: %lu\n", timestamp);
>  	printf("string: %s\n", string ? string : "(not set)");
>  	printf("abbrev: %d\n", abbrev);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19 17:58   ` Junio C Hamano
@ 2015-06-19 18:39     ` Junio C Hamano
  2015-06-20 15:31       ` Jakub Narębski
  2015-06-19 18:47     ` Jakub Narębski
  2015-06-20 16:51     ` Charles Bailey
  2 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2015-06-19 18:39 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Except for the minor nits above, I think this is a good change.

Oh, I forgot to mention one thing.  I am not sure if this should be
called ULONG.  "unsigned long"-ness is not the most important part
of this thing from the end-user's point of view, and also from the
point of view of the programmer who supports end-users by using this
new feature.

It is "unlike OPT_INTEGER, the user can specify it as a human
readble scaled quantity" that is the reason to use this new thing.
I think we discussed to introduce OPT_HUMINT (HUM stands for HUMAN,
obviously) or some name like that a few years ago to do exactly
this, but that is not a great name, either.

I was tempted to suggest a name that has "size" in it, but because
places that we may conceivably want to use it in the future would be
to specify:

 - sizes, e.g. "split the packfiles into 4.3G chunks".

 - counts, e.g. "show me the most recent 2k commits".

 - bandwidth, e.g. "limit the transfer to consume at most 2M bps".

which is not limited to size, it is not a very good idea, either.

OPT_SCALED_ULONG(), or something with "scaled" in its name, perhaps?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19 17:58   ` Junio C Hamano
  2015-06-19 18:39     ` Junio C Hamano
@ 2015-06-19 18:47     ` Jakub Narębski
  2015-06-20 16:51     ` Charles Bailey
  2 siblings, 0 replies; 51+ messages in thread
From: Jakub Narębski @ 2015-06-19 18:47 UTC (permalink / raw)
  To: Junio C Hamano, Charles Bailey; +Cc: git

W dniu 2015-06-19 o 19:58, Junio C Hamano pisze:
> Charles Bailey <charles@hashpling.org> writes: 
[...]
>> +		if (!git_parse_ulong(arg, opt->value))
>> +			return opterror(opt, "expects a numerical value", flags);
> 
> This used to be:
> 
>> -		die(_("unable to parse value '%s' for option %s"),
>> -		    arg, opt->long_name);
> 
> but opterror() talks about which option, so there is no information
> loss by losing "for option %s" from here.  That means there is only
> one difference for pack-objects:
> 
>     $ git pack-objects --max-pack-size=1T
>     fatal: unable to parse value '1T' for option max-pack-size
>     $ ./git pack-objects --max-pack-size=1T
>     error: option `max-pack-size' expects a numerical value
>     usage: git pack-objects --stdout [options...
>     ... 30 more lines omitted ...
> 
> Eh, make that two:
> 
>  * We no longer say what value we did not like.  The user presumably
>    knows what he typed, so this is only a minor loss.

Well, in this case this is not a problem, but for longer commandline
invocation it might be hard to find the exact argument among all the
options (though I don't think there is any integer-accepting option
that can be repeated).
> 
>  * We used to stop without giving "usage", as the error message was
>    specific enough.  We now spew descriptions on other options
>    unrelated to the specific error the user may want to concentrate
>    on.  Perhaps this is a minor regression.
> 
> I wonder if "expects a numerical value" is the best way to say this.

Is "expects numerical value" easier to understand than "unable to
parse value"?
 
> Ponder:
> 
>  - we do not take "4.8"

   - we won't take locale specific "4,8" (for some locales)

   - "4O" is not numerical... "40" is

>  - we do not take "-4".
>  - people may not realize, from "numerical", that we take "5M".
> 
> Except for the minor nits above, I think this is a good change.
> 
> This is a totally unrelated tangent that does not have to be part of
> your series, but we probably should take "4.8M"; I do not think we
> currently do.
> 
> Oh, and perhaps 1T, too.
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19 18:39     ` Junio C Hamano
@ 2015-06-20 15:31       ` Jakub Narębski
  0 siblings, 0 replies; 51+ messages in thread
From: Jakub Narębski @ 2015-06-20 15:31 UTC (permalink / raw)
  To: Junio C Hamano, Charles Bailey; +Cc: git

W dniu 2015-06-19 o 20:39, Junio C Hamano pisze:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Except for the minor nits above, I think this is a good change.
> 
> Oh, I forgot to mention one thing.  I am not sure if this should be
> called ULONG.  "unsigned long"-ness is not the most important part
> of this thing from the end-user's point of view, and also from the
> point of view of the programmer who supports end-users by using this
> new feature.
> 
> It is "unlike OPT_INTEGER, the user can specify it as a human
> readble scaled quantity" that is the reason to use this new thing.
> I think we discussed to introduce OPT_HUMINT (HUM stands for HUMAN,
> obviously) or some name like that a few years ago to do exactly
> this, but that is not a great name, either.

On the output side it is often called --human-readable (e.g. du(1)),
I don't know how it is called on input side (e.g. in 'dd' and friends).

> I was tempted to suggest a name that has "size" in it, but because
> places that we may conceivably want to use it in the future would be
> to specify:
> 
>  - sizes, e.g. "split the packfiles into 4.3G chunks".
> 
>  - counts, e.g. "show me the most recent 2k commits".
> 
>  - bandwidth, e.g. "limit the transfer to consume at most 2M bps".
> 
> which is not limited to size, it is not a very good idea, either.
> 
> OPT_SCALED_ULONG(), or something with "scaled" in its name, perhaps?

OPT_HUMAN_READABLE_INTEGER() is probably out as too long? ;-P

-- 
Jakub Narębski

 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-19 17:58   ` Junio C Hamano
  2015-06-19 18:39     ` Junio C Hamano
  2015-06-19 18:47     ` Jakub Narębski
@ 2015-06-20 16:51     ` Charles Bailey
  2015-06-20 17:47       ` Junio C Hamano
  2 siblings, 1 reply; 51+ messages in thread
From: Charles Bailey @ 2015-06-20 16:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Jun 19, 2015 at 10:58:51AM -0700, Junio C Hamano wrote:
> Charles Bailey <charles@hashpling.org> writes:
> 
> Please place it immediately after INTEGER, as they are conceptually
> siblings---group similar things together.

Sorry, this is a bad habit from working on projects where changing the
value of existing enum identifiers cause bad things.

> This used to be:
> 
> > -		die(_("unable to parse value '%s' for option %s"),
> > -		    arg, opt->long_name);
> 
> but opterror() talks about which option, so there is no information
> loss by losing "for option %s" from here.  That means there is only
> one difference for pack-objects:
> 
>     $ git pack-objects --max-pack-size=1T
>     fatal: unable to parse value '1T' for option max-pack-size
>     $ ./git pack-objects --max-pack-size=1T
>     error: option `max-pack-size' expects a numerical value
>     usage: git pack-objects --stdout [options...
>     ... 30 more lines omitted ...
> 
> Eh, make that two:
> 
>  * We no longer say what value we did not like.  The user presumably
>    knows what he typed, so this is only a minor loss.
> 
>  * We used to stop without giving "usage", as the error message was
>    specific enough.  We now spew descriptions on other options
>    unrelated to the specific error the user may want to concentrate
>    on.  Perhaps this is a minor regression.
> 
> I wonder if "expects a numerical value" is the best way to say this.

I was aware that I was changing the error reporting for max-pack-size
and window-memory but thought that by going with the existing behaviour
of OPT_INTEGER I'd be going with a more established pattern.

These observations also seem to apply to OPT_INTEGER handling. Would
this be something that we'd want to fix too?

Currently git package-objects --depth=5.5 prints:

    error: option `depth' expects a numerical value
    usage: git pack-objects --stdout [options...
    [... many more lines omitted ...]

Obviously, changing this to skip the full usage report would affect many
existing commands.

Also, I preserved the PARSE_OPT_NONEG flag for OPT_ULONG but would this
ever not make sense for an OPT_INTEGER option?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c
  2015-06-20 16:51     ` Charles Bailey
@ 2015-06-20 17:47       ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-20 17:47 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> On Fri, Jun 19, 2015 at 10:58:51AM -0700, Junio C Hamano wrote:
>
>> Eh, make that two:
>> 
>>  * We no longer say what value we did not like.  The user presumably
>>    knows what he typed, so this is only a minor loss.
>> 
>>  * We used to stop without giving "usage", as the error message was
>>    specific enough.  We now spew descriptions on other options
>>    unrelated to the specific error the user may want to concentrate
>>    on.  Perhaps this is a minor regression.
>> 
>> I wonder if "expects a numerical value" is the best way to say this.
>
> I was aware that I was changing the error reporting for max-pack-size
> and window-memory but thought that by going with the existing behaviour
> of OPT_INTEGER I'd be going with a more established pattern.

That is OK.  I just wanted to see that design decision explicitly
recorded in the proposed log message.

> Currently git package-objects --depth=5.5 prints:
>
>     error: option `depth' expects a numerical value
>     usage: git pack-objects --stdout [options...
>     [... many more lines omitted ...]

Interesting.  I get this instead:

    git: 'package-objects' is not a git command. See 'git --help'.

;-)  Jokes aside...

> Obviously, changing this to skip the full usage report would affect many
> existing commands.

Yes and I wouldn't suggest changing that in the same commit that
exposes the human-readable quantity parsing to parse-options API.
That is why I said "Perhaps this is a minor regression".  It is a
change in behaviour, and it may make it slightly worse, but on the
other hand it makes it in line with other types of options, so it
may be OK.

If we wanted to teach commands to omit "usage" when parsing of a
single option failed, we should be doing that consistently for
everybody, not just to pack-objects, and that is outside the scope
of this patch, I would think.

	Side note: Just to make it clear, regarding anything I say
	is "outside the scope of this patch", I am not asking you to
	do them as follow-up patches (as a precondition to accept
	this patch).  For that matter, I am not convinced myself
	that some of them are even worth doing.  And I am not asking
	you _not_ to do these changes, ever, either.  I am just
	asking you _not_ to do any of them in _this_ patch.

> Also, I preserved the PARSE_OPT_NONEG flag for OPT_ULONG but would this
> ever not make sense for an OPT_INTEGER option?

It depends on what "git cmd --depth=4 --no-depth" should do.  In any
case, changing OPT_INT would be a separate topic outside the scope
of this patch, I think.

My gut feeling is that

    git pack-objects --max-pack-size=20m --no-max-pack-size

should be usable as a way to countermand a pack size limit given
earlier on the command line to make it unlimited, but that is
definitely a separate topic outside the scope of this patch (whose
purpose is to make an existing callback available to other callers).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Improvements to integer option parsing
  2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
                   ` (2 preceding siblings ...)
  2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
@ 2015-06-21 18:25 ` Charles Bailey
  2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
                     ` (2 more replies)
  2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
  4 siblings, 3 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-21 18:25 UTC (permalink / raw)
  To: Junio C Hamano, git

This is a re-roll of the first two patches in my previous series which used to
include "filter-objects" which is now a separate topic.

[PATCH 1/2] Correct test-parse-options to handle negative ints

The first one has changed only in that I've moved the additional test to a more
logical place in the test file.

[PATCH 2/2] Move unsigned long option parsing out of pack-objects.c

I've made the following changes to the second commit:

- renamed this OPT_MAGNITUDE to try and convey something that is
both unsigned and might benefit from a 'scale' suffix. I'm expecting
more discussion on the name!

- fixed the enum ordering to put this close to OPT_INTEGER

- added documentation to api-parse-options.txt

- marginally improved the opterror message on failed parses

- noted the change in behavior for the error messages generated for
pack-objects' --max-pack-size and --window-memory in the commit message

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/2] Correct test-parse-options to handle negative ints
  2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
@ 2015-06-21 18:25   ` Charles Bailey
  2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
  2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
  2 siblings, 0 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-21 18:25 UTC (permalink / raw)
  To: Junio C Hamano, git

From: Charles Bailey <cbailey32@bloomberg.net>

Fix the printf specification to treat 'integer' as the signed type that
it is and add a test that checks that we parse negative option
arguments.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
---
 t/t0040-parse-options.sh | 2 ++
 test-parse-options.c     | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index b044785..372d521 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -132,6 +132,8 @@ test_expect_success 'OPT_BOOL() no negation #2' 'check_unknown_i18n --no-no-fear
 
 test_expect_success 'OPT_BOOL() positivation' 'check boolean: 0 -D --doubt'
 
+test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
+
 cat > expect << EOF
 boolean: 2
 integer: 1729
diff --git a/test-parse-options.c b/test-parse-options.c
index 5dabce6..7c492cf 100644
--- a/test-parse-options.c
+++ b/test-parse-options.c
@@ -82,7 +82,7 @@ int main(int argc, char **argv)
 	argc = parse_options(argc, (const char **)argv, prefix, options, usage, 0);
 
 	printf("boolean: %d\n", boolean);
-	printf("integer: %u\n", integer);
+	printf("integer: %d\n", integer);
 	printf("timestamp: %lu\n", timestamp);
 	printf("string: %s\n", string ? string : "(not set)");
 	printf("abbrev: %d\n", abbrev);
-- 
2.4.0.53.g8440f74

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
  2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
  2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
@ 2015-06-21 18:25   ` Charles Bailey
  2015-06-21 18:30     ` Charles Bailey
  2015-06-22 22:08     ` Junio C Hamano
  2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
  2 siblings, 2 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-21 18:25 UTC (permalink / raw)
  To: Junio C Hamano, git

From: Charles Bailey <cbailey32@bloomberg.net>

The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing)
is more widely applicable. Add support for OPT_MAGNITUDE to
parse-options.h and change pack-objects.c use this support.

The error behavior on parse errors follows that of OPT_INTEGER.
The name of the option that failed to parse is reported with a brief
message describing the expect format for the option argument and then
the full usage message for the command invoked.

This is differs from the previous behavior for OPT_ULONG used in
pack-objects for --max-pack-size and --window-memory which used to
display the value supplied in the error message and did not display the
full usage message.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
---
 Documentation/technical/api-parse-options.txt |  6 ++++
 builtin/pack-objects.c                        | 25 +++------------
 parse-options.c                               | 17 ++++++++++
 parse-options.h                               |  3 ++
 t/t0040-parse-options.sh                      | 45 ++++++++++++++++++++++++---
 test-parse-options.c                          |  3 ++
 6 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
index 1f2db31..525cb2f 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -168,6 +168,12 @@ There are some macros to easily define options:
 	Introduce an option with integer argument.
 	The integer is put into `int_var`.
 
+`OPT_MAGNITUDE(short, long, &unsigned_long_var, description)`::
+	Introduce an option with a size argument. The argument must be a
+	non-negative integer and may include a suffix of 'k', 'm' or 'g' to
+	scale the provided value by 1024, 1024^2 or 1024^3 respectively.
+	The scaled value is put into `unsigned_long_var`.
+
 `OPT_DATE(short, long, &int_var, description)`::
 	Introduce an option with date argument, see `approxidate()`.
 	The timestamp is put into `int_var`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80fe8c7..62cc16d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2588,23 +2588,6 @@ static int option_parse_unpack_unreachable(const struct option *opt,
 	return 0;
 }
 
-static int option_parse_ulong(const struct option *opt,
-			      const char *arg, int unset)
-{
-	if (unset)
-		die(_("option %s does not accept negative form"),
-		    opt->long_name);
-
-	if (!git_parse_ulong(arg, opt->value))
-		die(_("unable to parse value '%s' for option %s"),
-		    arg, opt->long_name);
-	return 0;
-}
-
-#define OPT_ULONG(s, l, v, h) \
-	{ OPTION_CALLBACK, (s), (l), (v), "n", (h),	\
-	  PARSE_OPT_NONEG, option_parse_ulong }
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -2627,16 +2610,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		{ OPTION_CALLBACK, 0, "index-version", NULL, N_("version[,offset]"),
 		  N_("write the pack index file in the specified idx format version"),
 		  0, option_parse_index_version },
-		OPT_ULONG(0, "max-pack-size", &pack_size_limit,
-			  N_("maximum size of each output pack file")),
+		OPT_MAGNITUDE(0, "max-pack-size", &pack_size_limit,
+			      N_("maximum size of each output pack file")),
 		OPT_BOOL(0, "local", &local,
 			 N_("ignore borrowed objects from alternate object store")),
 		OPT_BOOL(0, "incremental", &incremental,
 			 N_("ignore packed objects")),
 		OPT_INTEGER(0, "window", &window,
 			    N_("limit pack window by objects")),
-		OPT_ULONG(0, "window-memory", &window_memory_limit,
-			  N_("limit pack window by memory in addition to object limit")),
+		OPT_MAGNITUDE(0, "window-memory", &window_memory_limit,
+			      N_("limit pack window by memory in addition to object limit")),
 		OPT_INTEGER(0, "depth", &depth,
 			    N_("maximum length of delta chain allowed in the resulting pack")),
 		OPT_BOOL(0, "reuse-delta", &reuse_delta,
diff --git a/parse-options.c b/parse-options.c
index 80106c0..101b649 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -180,6 +180,23 @@ static int get_value(struct parse_opt_ctx_t *p,
 			return opterror(opt, "expects a numerical value", flags);
 		return 0;
 
+	case OPTION_MAGNITUDE:
+		if (unset) {
+			*(unsigned long *)opt->value = 0;
+			return 0;
+		}
+		if (opt->flags & PARSE_OPT_OPTARG && !p->opt) {
+			*(unsigned long *)opt->value = opt->defval;
+			return 0;
+		}
+		if (get_arg(p, opt, flags, &arg))
+			return -1;
+		if (!git_parse_ulong(arg, opt->value))
+			return opterror(opt,
+				"expects a integer value with an optional k/m/g suffix",
+				flags);
+		return 0;
+
 	default:
 		die("should not happen, someone must be hit on the forehead");
 	}
diff --git a/parse-options.h b/parse-options.h
index c71e9da..ca865f6 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -16,6 +16,7 @@ enum parse_opt_type {
 	/* options with arguments (usually) */
 	OPTION_STRING,
 	OPTION_INTEGER,
+	OPTION_MAGNITUDE,
 	OPTION_CALLBACK,
 	OPTION_LOWLEVEL_CALLBACK,
 	OPTION_FILENAME
@@ -129,6 +130,8 @@ struct option {
 #define OPT_CMDMODE(s, l, v, h, i) { OPTION_CMDMODE, (s), (l), (v), NULL, \
 				      (h), PARSE_OPT_NOARG|PARSE_OPT_NONEG, NULL, (i) }
 #define OPT_INTEGER(s, l, v, h)     { OPTION_INTEGER, (s), (l), (v), N_("n"), (h) }
+#define OPT_MAGNITUDE(s, l, v, h)   { OPTION_MAGNITUDE, (s), (l), (v), \
+				      N_("n"), (h), PARSE_OPT_NONEG }
 #define OPT_STRING(s, l, v, a, h)   { OPTION_STRING,  (s), (l), (v), (a), (h) }
 #define OPT_STRING_LIST(s, l, v, a, h) \
 				    { OPTION_CALLBACK, (s), (l), (v), (a), \
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index 372d521..9be6411 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -19,6 +19,7 @@ usage: test-parse-options <options>
 
     -i, --integer <n>     get a integer
     -j <n>                get a integer, too
+    -m, --magnitude <n>   get a magnitude
     --set23               set integer to 23
     -t <time>             get timestamp of <time>
     -L, --length <str>    get length of <str>
@@ -58,6 +59,7 @@ mv expect expect.err
 cat >expect.template <<EOF
 boolean: 0
 integer: 0
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -134,9 +136,30 @@ test_expect_success 'OPT_BOOL() positivation' 'check boolean: 0 -D --doubt'
 
 test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
 
+test_expect_success 'OPT_MAGNITUDE() simple' '
+	check magnitude: 2345678 -m 2345678
+'
+
+test_expect_success 'OPT_MAGNITUDE() kilo' '
+	check magnitude: 239616 -m 234k
+'
+
+test_expect_success 'OPT_MAGNITUDE() mega' '
+	check magnitude: 104857600 -m 100m
+'
+
+test_expect_success 'OPT_MAGNITUDE() giga' '
+	check magnitude: 1073741824 -m 1g
+'
+
+test_expect_success 'OPT_MAGNITUDE() 3giga' '
+	check magnitude: 3221225472 -m 3g
+'
+
 cat > expect << EOF
 boolean: 2
 integer: 1729
+magnitude: 16384
 timestamp: 0
 string: 123
 abbrev: 7
@@ -147,8 +170,8 @@ file: prefix/my.file
 EOF
 
 test_expect_success 'short options' '
-	test-parse-options -s123 -b -i 1729 -b -vv -n -F my.file \
-	> output 2> output.err &&
+	test-parse-options -s123 -b -i 1729 -m 16k -b -vv -n -F my.file \
+	>output 2>output.err &&
 	test_cmp expect output &&
 	test_must_be_empty output.err
 '
@@ -156,6 +179,7 @@ test_expect_success 'short options' '
 cat > expect << EOF
 boolean: 2
 integer: 1729
+magnitude: 16384
 timestamp: 0
 string: 321
 abbrev: 10
@@ -166,9 +190,10 @@ file: prefix/fi.le
 EOF
 
 test_expect_success 'long options' '
-	test-parse-options --boolean --integer 1729 --boolean --string2=321 \
-		--verbose --verbose --no-dry-run --abbrev=10 --file fi.le\
-		--obsolete > output 2> output.err &&
+	test-parse-options --boolean --integer 1729 --magnitude 16k \
+		--boolean --string2=321 --verbose --verbose --no-dry-run \
+		--abbrev=10 --file fi.le --obsolete \
+		>output 2>output.err &&
 	test_must_be_empty output.err &&
 	test_cmp expect output
 '
@@ -182,6 +207,7 @@ test_expect_success 'missing required value' '
 cat > expect << EOF
 boolean: 1
 integer: 13
+magnitude: 0
 timestamp: 0
 string: 123
 abbrev: 7
@@ -204,6 +230,7 @@ test_expect_success 'intermingled arguments' '
 cat > expect << EOF
 boolean: 0
 integer: 2
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -232,6 +259,7 @@ test_expect_success 'ambiguously abbreviated option' '
 cat > expect << EOF
 boolean: 0
 integer: 0
+magnitude: 0
 timestamp: 0
 string: 123
 abbrev: 7
@@ -270,6 +298,7 @@ test_expect_success 'detect possible typos' '
 cat > expect <<EOF
 boolean: 0
 integer: 0
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -289,6 +318,7 @@ test_expect_success 'keep some options as arguments' '
 cat > expect <<EOF
 boolean: 0
 integer: 0
+magnitude: 0
 timestamp: 1
 string: (not set)
 abbrev: 7
@@ -310,6 +340,7 @@ cat > expect <<EOF
 Callback: "four", 0
 boolean: 5
 integer: 4
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -338,6 +369,7 @@ test_expect_success 'OPT_CALLBACK() and callback errors work' '
 cat > expect <<EOF
 boolean: 1
 integer: 23
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -362,6 +394,7 @@ test_expect_success 'OPT_NEGBIT() and OPT_SET_INT() work' '
 cat > expect <<EOF
 boolean: 6
 integer: 0
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -392,6 +425,7 @@ test_expect_success 'OPT_COUNTUP() with PARSE_OPT_NODASH works' '
 cat > expect <<EOF
 boolean: 0
 integer: 12345
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -410,6 +444,7 @@ test_expect_success 'OPT_NUMBER_CALLBACK() works' '
 cat >expect <<EOF
 boolean: 0
 integer: 0
+magnitude: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
diff --git a/test-parse-options.c b/test-parse-options.c
index 7c492cf..2c8c8f1 100644
--- a/test-parse-options.c
+++ b/test-parse-options.c
@@ -4,6 +4,7 @@
 
 static int boolean = 0;
 static int integer = 0;
+static unsigned long magnitude = 0;
 static unsigned long timestamp;
 static int abbrev = 7;
 static int verbose = 0, dry_run = 0, quiet = 0;
@@ -48,6 +49,7 @@ int main(int argc, char **argv)
 		OPT_GROUP(""),
 		OPT_INTEGER('i', "integer", &integer, "get a integer"),
 		OPT_INTEGER('j', NULL, &integer, "get a integer, too"),
+		OPT_MAGNITUDE('m', "magnitude", &magnitude, "get a magnitude"),
 		OPT_SET_INT(0, "set23", &integer, "set integer to 23", 23),
 		OPT_DATE('t', NULL, &timestamp, "get timestamp of <time>"),
 		OPT_CALLBACK('L', "length", &integer, "str",
@@ -83,6 +85,7 @@ int main(int argc, char **argv)
 
 	printf("boolean: %d\n", boolean);
 	printf("integer: %d\n", integer);
+	printf("magnitude: %lu\n", magnitude);
 	printf("timestamp: %lu\n", timestamp);
 	printf("string: %s\n", string ? string : "(not set)");
 	printf("abbrev: %d\n", abbrev);
-- 
2.4.0.53.g8440f74

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
  2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
@ 2015-06-21 18:30     ` Charles Bailey
  2015-06-22 22:03       ` Junio C Hamano
  2015-06-22 22:08     ` Junio C Hamano
  1 sibling, 1 reply; 51+ messages in thread
From: Charles Bailey @ 2015-06-21 18:30 UTC (permalink / raw)
  To: Junio C Hamano, git

On Sun, Jun 21, 2015 at 07:25:44PM +0100, Charles Bailey wrote:
> From: Charles Bailey <cbailey32@bloomberg.net>
> 
> diff --git a/parse-options.c b/parse-options.c
> index 80106c0..101b649 100644
> --- a/parse-options.c
> +++ b/parse-options.c
> @@ -180,6 +180,23 @@ static int get_value(struct parse_opt_ctx_t *p,
>  			return opterror(opt, "expects a numerical value", flags);
>  		return 0;
>  
> +	case OPTION_MAGNITUDE:
> +		if (unset) {
> +			*(unsigned long *)opt->value = 0;
> +			return 0;
> +		}
> +		if (opt->flags & PARSE_OPT_OPTARG && !p->opt) {
> +			*(unsigned long *)opt->value = opt->defval;
> +			return 0;
> +		}
> +		if (get_arg(p, opt, flags, &arg))
> +			return -1;
> +		if (!git_parse_ulong(arg, opt->value))
> +			return opterror(opt,
> +				"expects a integer value with an optional k/m/g suffix",
> +				flags);
> +		return 0;
> +

Spotted after sending:
s/expects a integer/expects an integer/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Fast enumeration of objects
  2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
                   ` (3 preceding siblings ...)
  2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
@ 2015-06-21 19:20 ` Charles Bailey
  2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
  2015-06-22  8:35   ` Fast enumeration of objects Jeff King
  4 siblings, 2 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-21 19:20 UTC (permalink / raw)
  To: Junio C Hamano, git

This is a re-casting of my previous filter-objects command but without
any of the filtering so it is now just "list-all-objects".

I have retained the "--verbose" option which outputs the same format as
the default "cat-file --batch-check" as it provides a useful performance
gain to filtering though "cat-file" if this basic information is all
that is needed.

The motivating use case is to enable a script to quickly scan a large
number of repositories for any large objects.

I performed some test timings of some different commands on a clone of
the Linux kernel which was completely packed.

	$ time git rev-list --all --objects |
		cut -d" " -f1 |
		git cat-file --batch-check |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m30.823s
	user    0m41.904s
	sys     0m7.728s

list-all-objects gives a significant improvement:

	$ time git list-all-objects |
		git cat-file --batch-check |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m9.585s
	user    0m10.820s
	sys     0m4.960s

skipping the cat-filter filter is a lesser but still significant
improvement:

	$ time git list-all-objects -v |
		awk '{if ($3 >= 512000) { print $1 }}' |
		wc -l
	958

	real    0m5.637s
	user    0m6.652s
	sys     0m0.156s

The old filter-objects could do the size filter a little be faster, but
not by much:

	$ time git filter-objects --min-size=500k |
		wc -l
	958

	real    0m4.564s
	user    0m4.496s
	sys     0m0.064s

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] Add list-all-objects command
  2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
@ 2015-06-21 19:20   ` Charles Bailey
  2015-06-22  8:38     ` Jeff King
  2015-06-22  9:57     ` Duy Nguyen
  2015-06-22  8:35   ` Fast enumeration of objects Jeff King
  1 sibling, 2 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-21 19:20 UTC (permalink / raw)
  To: Junio C Hamano, git

From: Charles Bailey <cbailey32@bloomberg.net>

list-all-objects is a command to print the ids of all objects in the
object database of a repository. It is designed as a low overhead
interface for scripts that want to analyse all objects but don't require
the ordering implied by a revision walk.

It will list all objects, loose and packed, and will include unreachable
objects.

list-all-objects is faster that "rev-list --all --objects" but there is
no guarantee as to the order in which objects will be listed.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
---
 Documentation/git-list-all-objects.txt | 29 +++++++++++++++
 Makefile                               |  1 +
 builtin.h                              |  1 +
 builtin/list-all-objects.c             | 64 ++++++++++++++++++++++++++++++++++
 git.c                                  |  1 +
 t/t8100-list-all-objects.sh            | 45 ++++++++++++++++++++++++
 6 files changed, 141 insertions(+)
 create mode 100644 Documentation/git-list-all-objects.txt
 create mode 100644 builtin/list-all-objects.c
 create mode 100755 t/t8100-list-all-objects.sh

diff --git a/Documentation/git-list-all-objects.txt b/Documentation/git-list-all-objects.txt
new file mode 100644
index 0000000..5f28d41
--- /dev/null
+++ b/Documentation/git-list-all-objects.txt
@@ -0,0 +1,29 @@
+git-list-all-objects(1)
+=======================
+
+NAME
+----
+git-list-all-objects - List all objects in the repository.
+
+SYNOPSIS
+--------
+[verse]
+'git list-all-objects' [-v|--verbose]
+
+DESCRIPTION
+-----------
+List the ids of all objects in a repository, including any unreachable objects.
+If `--verbose` is specified then the object's type and size is printed out as
+well as its id.
+
+OPTIONS
+-------
+
+-v::
+--verbose::
+	Output in the followin format instead of just printing object ids:
+	<sha1> SP <type> SP <size>
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 149f1c7..cf4f0c3 100644
--- a/Makefile
+++ b/Makefile
@@ -853,6 +853,7 @@ BUILTIN_OBJS += builtin/help.o
 BUILTIN_OBJS += builtin/index-pack.o
 BUILTIN_OBJS += builtin/init-db.o
 BUILTIN_OBJS += builtin/interpret-trailers.o
+BUILTIN_OBJS += builtin/list-all-objects.o
 BUILTIN_OBJS += builtin/log.o
 BUILTIN_OBJS += builtin/ls-files.o
 BUILTIN_OBJS += builtin/ls-remote.o
diff --git a/builtin.h b/builtin.h
index b87df70..112bafb 100644
--- a/builtin.h
+++ b/builtin.h
@@ -74,6 +74,7 @@ extern int cmd_help(int argc, const char **argv, const char *prefix);
 extern int cmd_index_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_init_db(int argc, const char **argv, const char *prefix);
 extern int cmd_interpret_trailers(int argc, const char **argv, const char *prefix);
+extern int cmd_list_all_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_log(int argc, const char **argv, const char *prefix);
 extern int cmd_log_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_files(int argc, const char **argv, const char *prefix);
diff --git a/builtin/list-all-objects.c b/builtin/list-all-objects.c
new file mode 100644
index 0000000..3b43b02
--- /dev/null
+++ b/builtin/list-all-objects.c
@@ -0,0 +1,64 @@
+#include "cache.h"
+#include "builtin.h"
+#include "revision.h"
+#include "parse-options.h"
+
+#include <stdio.h>
+
+static int verbose;
+
+static int print_object(const unsigned char *sha1)
+{
+	if (verbose) {
+		unsigned long size;
+		int type = sha1_object_info(sha1, &size);
+
+		if (type < 0)
+			return -1;
+
+		printf("%s %s %lu\n", sha1_to_hex(sha1), typename(type), size);
+	}
+	else
+		printf("%s\n", sha1_to_hex(sha1));
+
+	return 0;
+}
+
+static int check_loose_object(const unsigned char *sha1,
+			      const char *path,
+			      void *data)
+{
+	return print_object(sha1);
+}
+
+static int check_packed_object(const unsigned char *sha1,
+			       struct packed_git *pack,
+			       uint32_t pos,
+			       void *data)
+{
+	return print_object(sha1);
+}
+
+static struct option builtin_filter_objects_options[] = {
+	OPT__VERBOSE(&verbose, "show object type and size"),
+	OPT_END()
+};
+
+int cmd_list_all_objects(int argc, const char **argv, const char *prefix)
+{
+	struct packed_git *p;
+
+	argc = parse_options(argc, argv, prefix, builtin_filter_objects_options,
+			     NULL, 0);
+
+	for_each_loose_object(check_loose_object, NULL, 0);
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next) {
+		open_pack_index(p);
+	}
+
+	for_each_packed_object(check_packed_object, NULL, 0);
+
+	return 0;
+}
diff --git a/git.c b/git.c
index 44374b1..81e8ae4 100644
--- a/git.c
+++ b/git.c
@@ -417,6 +417,7 @@ static struct cmd_struct commands[] = {
 	{ "init", cmd_init_db, NO_SETUP },
 	{ "init-db", cmd_init_db, NO_SETUP },
 	{ "interpret-trailers", cmd_interpret_trailers, RUN_SETUP },
+	{ "list-all-objects", cmd_list_all_objects, RUN_SETUP },
 	{ "log", cmd_log, RUN_SETUP },
 	{ "ls-files", cmd_ls_files, RUN_SETUP },
 	{ "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY },
diff --git a/t/t8100-list-all-objects.sh b/t/t8100-list-all-objects.sh
new file mode 100755
index 0000000..a7b51ce
--- /dev/null
+++ b/t/t8100-list-all-objects.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='git list-all-objects'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo hello, world >file &&
+	git add file &&
+	git commit -m "initial"
+'
+
+test_basic_repo_objects () {
+	git cat-file --batch-check="%(objectname)" <<-EOF >expected.unsorted &&
+		HEAD
+		HEAD:file
+		HEAD^{tree}
+	EOF
+	git list-all-objects >all-objects.unsorted &&
+	sort expected.unsorted >expected &&
+	sort all-objects.unsorted >all-objects &&
+	test_cmp all-objects expected
+}
+
+test_expect_success 'list all objects' '
+	test_basic_repo_objects
+'
+test_expect_success 'list all objects after pack' '
+	git repack -Ad &&
+	test_basic_repo_objects
+'
+
+test_expect_success 'verbose output' '
+	git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)" \
+			<<-EOF >expected.unsorted &&
+		HEAD
+		HEAD:file
+		HEAD^{tree}
+	EOF
+	git list-all-objects -v >all-objects.unsorted &&
+	sort expected.unsorted >expected &&
+	sort all-objects.unsorted >all-objects &&
+	test_cmp all-objects expected
+'
+
+test_done
-- 
2.4.0.53.g8440f74

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: Fast enumeration of objects
  2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
  2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
@ 2015-06-22  8:35   ` Jeff King
  2015-06-22 19:44     ` Junio C Hamano
  1 sibling, 1 reply; 51+ messages in thread
From: Jeff King @ 2015-06-22  8:35 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Sun, Jun 21, 2015 at 08:20:30PM +0100, Charles Bailey wrote:

> I performed some test timings of some different commands on a clone of
> the Linux kernel which was completely packed.

Thanks for timing things. I think we can fairly easily improve a bit on
what you have here. I'll go through my full analysis, but see the
conclusions at the end.

> 	$ time git rev-list --all --objects |
> 		cut -d" " -f1 |
> 		git cat-file --batch-check |
> 		awk '{if ($3 >= 512000) { print $1 }}' |
> 		wc -l
> 	958
> 
> 	real    0m30.823s
> 	user    0m41.904s
> 	sys     0m7.728s
> 
> list-all-objects gives a significant improvement:
> 
> 	$ time git list-all-objects |
> 		git cat-file --batch-check |
> 		awk '{if ($3 >= 512000) { print $1 }}' |
> 		wc -l
> 	958
> 
> 	real    0m9.585s
> 	user    0m10.820s
> 	sys     0m4.960s

That makes sense; of course these two are not necessarily producing the
same answer (they do in your case because it's a fresh clone, and all of
the objects are reachable). I think that's an acceptable caveat.

You can speed up the second one by asking batch-check only for the parts
you care about:

  git list-all-objects |
  git cat-file --batch-check='%(objectsize) %(objectname)' |
  awk '{if ($1 >= 512000) { print $2 }}' |
  wc -l

That dropped my best-of-five timings for the same test down from 9.5s to
7.0s. The answer should be the same. The reason is that cat-file will
only compute the items it needs to show, and the object-type is more
expensive to get than the size[1].

Replacing awk with:

  perl -alne 'print $F[0] if $F[1] > 512000'

dropped that to 6.0s. That mostly means my awk sucks, but it is
interesting to note that not all of the extra time is pipe overhead
inherent to this approach; your choice of processor matters, too.

If you're willing to get a slightly different answer, but one that is
often just as useful, you can replace the "%(objectsize)" in the
cat-file invocation with "%(objectsize:disk)". That gives you the actual
on-disk size of the object, which includes delta and zlib compression.
For 512K, that produces very different results (because files of that
size may actually be text file). But for most truly huge files, they
typically do not delta or compress at all, and the on-disk size is
roughly the same.

That only shaves off 100-200 milliseconds, though.

[1] If you are wondering why the size is cheaper than the type, it is
    because of deltas. For base objects, we can get either immediately
    from the pack entry's header. For a delta, to get the size we have
    to open the object data; the expected size is part of the delta
    data. So we pay the extra cost to zlib-inflate the first few bytes.
    But finding the type works differently; the type in the pack header
    is OFS_DELTA, so we have to walk back to the parent entry to find
    the real type.  If that parent is a delta, we walk back recursively
    until we hit a base object.

    You'd think that would also make %(objectsize:disk) much cheaper
    than %(objectsize), too. But the disk sizes require computing a
    the pack revindex on the fly, which takes a few hundred milliseconds
    on linux.git.

> skipping the cat-filter filter is a lesser but still significant
> improvement:
> 
> 	$ time git list-all-objects -v |
> 		awk '{if ($3 >= 512000) { print $1 }}' |
> 		wc -l
> 	958
> 
> 	real    0m5.637s
> 	user    0m6.652s
> 	sys     0m0.156s

That's a pretty nice improvement over the piped version. But we cannot
do the same custom-format optimization there, because "-v" does not
support it. It would be nice if it supported the full range of cat-file
formatters.

I did a hacky proof-of-concept, and that brought my 6.0s time down to
4.9s.

I also noticed that cat-file doesn't do any output buffering; this is
because it may be used interactively, line by line, by a caller
controlling both pipes. Replacing write_or_die() with fwrite in my
proof-of-concept dropped the time to 3.7s.

That's faster still than your original (different machines, obviously,
but your times are similar to mine):

> The old filter-objects could do the size filter a little be faster, but
> not by much:
> 
> 	$ time git filter-objects --min-size=500k |
> 		wc -l
> 	958
> 
> 	real    0m4.564s
> 	user    0m4.496s
> 	sys     0m0.064s

This is likely caused by your use of sha1_object_info(), which always
computes the type. Switching to the extended form would probably buy you
another 2 seconds or so.

Also, all my numbers are wall-clock times. The CPU time for my 3.7s time
is actually 6.8s. Whereas doing it all in one process would probably
require 3.0s or so of actual CPU time.

So my conclusions are:

  1. Yes, the pipe/parsing overhead of a separate processor really is
     measurable. That's hidden in the wall-clock time if you have
     multiple cores, but you may care more about CPU time. I still think
     the flexibility is worth it.

  2. Cutting out the pipe to cat-file is worth doing, as it saves a few
     seconds. Cutting out "%(objecttype)" saves a lot, too, and is worth
     doing. We should teach "list-all-objects -v" to use cat-file's
     custom formatters (alternatively, we could just teach cat-file a
     "--batch-all-objects" option rather than add a new command).

  3. We should teach cat-file a "--buffer" option to use fwrite. Even if
     we end up with "list-all-objects --format='%(objectsize)'" for this
     task, it would help all the other uses of cat-file.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
@ 2015-06-22  8:38     ` Jeff King
  2015-06-22 10:33       ` Jeff King
  2015-06-22 11:38       ` Charles Bailey
  2015-06-22  9:57     ` Duy Nguyen
  1 sibling, 2 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22  8:38 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Sun, Jun 21, 2015 at 08:20:31PM +0100, Charles Bailey wrote:

> +OPTIONS
> +-------
> +
> +-v::
> +--verbose::
> +	Output in the followin format instead of just printing object ids:
> +	<sha1> SP <type> SP <size>

s/followin/&g/

> +int cmd_list_all_objects(int argc, const char **argv, const char *prefix)
> +{
> +	struct packed_git *p;
> +
> +	argc = parse_options(argc, argv, prefix, builtin_filter_objects_options,
> +			     NULL, 0);
> +
> +	for_each_loose_object(check_loose_object, NULL, 0);
> +
> +	prepare_packed_git();
> +	for (p = packed_git; p; p = p->next) {
> +		open_pack_index(p);
> +	}

Yikes. The fact that you need to do this means that
for_each_packed_object is buggy, IMHO. I'll send a patch.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
  2015-06-22  8:38     ` Jeff King
@ 2015-06-22  9:57     ` Duy Nguyen
  2015-06-22 10:24       ` Jeff King
  1 sibling, 1 reply; 51+ messages in thread
From: Duy Nguyen @ 2015-06-22  9:57 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, Git Mailing List

On Mon, Jun 22, 2015 at 2:20 AM, Charles Bailey <charles@hashpling.org> wrote:
> From: Charles Bailey <cbailey32@bloomberg.net>
>
> list-all-objects is a command to print the ids of all objects in the
> object database of a repository. It is designed as a low overhead
> interface for scripts that want to analyse all objects but don't require
> the ordering implied by a revision walk.
>
> It will list all objects, loose and packed, and will include unreachable
> objects.

Nit picking, but perhaps we should allow to select object source:
loose, packed, alternates.. These info are available now and cheap to
get. It's ok not to do it now though.

Personally I would name this command "find-objects" (after unix
command "find") where we could still filter objects _not_ based on
object content.
-- 
Duy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-22  9:57     ` Duy Nguyen
@ 2015-06-22 10:24       ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:24 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Charles Bailey, Junio C Hamano, Git Mailing List

On Mon, Jun 22, 2015 at 04:57:28PM +0700, Duy Nguyen wrote:

> On Mon, Jun 22, 2015 at 2:20 AM, Charles Bailey <charles@hashpling.org> wrote:
> > From: Charles Bailey <cbailey32@bloomberg.net>
> >
> > list-all-objects is a command to print the ids of all objects in the
> > object database of a repository. It is designed as a low overhead
> > interface for scripts that want to analyse all objects but don't require
> > the ordering implied by a revision walk.
> >
> > It will list all objects, loose and packed, and will include unreachable
> > objects.
> 
> Nit picking, but perhaps we should allow to select object source:
> loose, packed, alternates.. These info are available now and cheap to
> get. It's ok not to do it now though.

There is already plumbing to do those individual operations if you want.
Although some of the plumbing involves "for i in objects/pack/*.pack",
which is perhaps a little less abstract than we'd like. :)

> Personally I would name this command "find-objects" (after unix
> command "find") where we could still filter objects _not_ based on
> object content.

I like that better than "ls", too, but I propose that we actually add
this as a feature to cat-file. I'll send patches in a moment.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-22  8:38     ` Jeff King
@ 2015-06-22 10:33       ` Jeff King
  2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
                           ` (9 more replies)
  2015-06-22 11:38       ` Charles Bailey
  1 sibling, 10 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:33 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:

> > +	prepare_packed_git();
> > +	for (p = packed_git; p; p = p->next) {
> > +		open_pack_index(p);
> > +	}
> 
> Yikes. The fact that you need to do this means that
> for_each_packed_object is buggy, IMHO. I'll send a patch.

Here's that patch. And since I did not want to pile work on Charles, I
went ahead and just implemented the patches I suggested in the other
email.

We may want to take patch 1 separately for the maint-track, as it is
really a bug-fix (albeit one that I do not think actually affects anyone
in practice right now).

Patches 2-5 are useful even if we go with Charles' command, as they make
cat-file better (cleanups and he new buffer option).

Patches 6-7 implement the cat-file option that would be redundant with
list-all-objects.

By the way, in addition to not showing objects in order,
list-all-objects (and my cat-file option) may show duplicates. Do we
want to "sort -u" for the user? It might be nice for them to always get
a de-duped and sorted list. Aside from the CPU cost of sorting, it does
mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
that's not too much when you are talking about the kernel repo. I took
the coward's way out and just mentioned the limitation in the
documentation, but I'm happy to be persuaded.

  [1/7]: for_each_packed_object: automatically open pack index
  [2/7]: cat-file: minor style fix in options list
  [3/7]: cat-file: move batch_options definition to top of file
  [4/7]: cat-file: add --buffer option
  [5/7]: cat-file: stop returning value from batch_one_object
  [6/7]: cat-file: split batch_one_object into two stages
  [7/7]: cat-file: add --batch-all-objects option

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/7] for_each_packed_object: automatically open pack index
  2015-06-22 10:33       ` Jeff King
@ 2015-06-22 10:40         ` Jeff King
  2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
                           ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:40 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

When for_each_packed_object is called, we call
prepare_packed_git() to make sure we have the actual list of
packs. But the latter does not actually open the pack
indices, meaning that pack->nr_objects may simply be 0 if
the pack has not otherwise been used since the program
started.

In practice, this didn't come up for the current callers,
because they iterate the packed objects only after iterating
all reachable objects (so for it to matter you would have to
have a pack consisting only of unreachable objects). But it
is a dangerous and confusing interface that should be fixed
for future callers.

Note that we do not end the iteration when a pack cannot be
opened, but we do return an error. That lets you complete
the iteration even in actively-repacked repository where an
.idx file may racily go away, but it also lets callers know
that they may not have gotten the complete list (which the
current reachability-check caller does care about).

We have to tweak one of the prune tests due to the changed
return value; an earlier test creates bogus .idx files and
does not clean them up. Having to make this tweak is a good
thing; it means we will not prune in a broken repository,
and the test confirms that we do not negatively impact a
more lenient caller, count-objects.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c      | 7 ++++++-
 t/t5304-prune.sh | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5038475..f1f0efb 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3573,14 +3573,19 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 {
 	struct packed_git *p;
 	int r = 0;
+	int pack_errors = 0;
 
 	prepare_packed_git();
 	for (p = packed_git; p; p = p->next) {
 		if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
 			continue;
+		if (open_pack_index(p)) {
+			pack_errors = 1;
+			continue;
+		}
 		r = for_each_object_in_pack(p, cb, data);
 		if (r)
 			break;
 	}
-	return r;
+	return r ? r : pack_errors;
 }
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 0794d33..023d7c6 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -218,6 +218,7 @@ test_expect_success 'gc: prune old objects after local clone' '
 '
 
 test_expect_success 'garbage report in count-objects -v' '
+	test_when_finished "rm -f .git/objects/pack/fake*" &&
 	: >.git/objects/pack/foo &&
 	: >.git/objects/pack/foo.bar &&
 	: >.git/objects/pack/foo.keep &&
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/7] cat-file: minor style fix in options list
  2015-06-22 10:33       ` Jeff King
  2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
@ 2015-06-22 10:40         ` Jeff King
  2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:40 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

We do not put extra whitespace before the first macro
argument.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/cat-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 049a95f..6cbcccc 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -412,7 +412,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('p', NULL, &opt, N_("pretty-print object's content"), 'p'),
 		OPT_CMDMODE(0, "textconv", &opt,
 			    N_("for blob objects, run textconv on object's content"), 'c'),
-		OPT_BOOL( 0, "allow-unknown-type", &unknown_type,
+		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
 		{ OPTION_CALLBACK, 0, "batch", &batch, "format",
 			N_("show info and content of objects fed from the standard input"),
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/7] cat-file: move batch_options definition to top of file
  2015-06-22 10:33       ` Jeff King
  2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
  2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
@ 2015-06-22 10:41         ` Jeff King
  2015-06-22 10:45         ` [PATCH 4/7] cat-file: add --buffer option Jeff King
                           ` (6 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:41 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

That way all of the functions can make use of it.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/cat-file.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 6cbcccc..d4101b7 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -10,6 +10,13 @@
 #include "streaming.h"
 #include "tree-walk.h"
 
+struct batch_options {
+	int enabled;
+	int follow_symlinks;
+	int print_contents;
+	const char *format;
+};
+
 static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 			int unknown_type)
 {
@@ -232,12 +239,6 @@ static void print_object_or_die(int fd, struct expand_data *data)
 	}
 }
 
-struct batch_options {
-	int enabled;
-	int follow_symlinks;
-	int print_contents;
-	const char *format;
-};
 
 static int batch_one_object(const char *obj_name, struct batch_options *opt,
 			    struct expand_data *data)
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/7] cat-file: add --buffer option
  2015-06-22 10:33       ` Jeff King
                           ` (2 preceding siblings ...)
  2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
@ 2015-06-22 10:45         ` Jeff King
  2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
                           ` (5 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:45 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

We use a direct write() to output the results of --batch and
--batch-check. This is good for processes feeding the input
and reading the output interactively, but it introduces
measurable overhead if you do not want this feature. For
example, on linux.git:

  $ git rev-list --objects --all | cut -d' ' -f1 >objects
  $ time git cat-file --batch-check='%(objectsize)' \
          <objects >/dev/null
  real    0m5.440s
  user    0m5.060s
  sys     0m0.384s

This patch adds an option to use regular stdio buffering:

  $ time git cat-file --batch-check='%(objectsize)' \
          --buffer <objects >/dev/null
  real    0m4.975s
  user    0m4.888s
  sys     0m0.092s

Signed-off-by: Jeff King <peff@peff.net>
---
This selectively uses fwrite or write_or_die, depending on the buffer
setting. Another option would be to just always use fwrite(), and then
selectively fflush(). It feels kind of wasteful in the non-buffered
case, as it's just another layer to write through. OTOH, the cost of
writing a line into the buffer only to flush is probably dwarfed by the
system call of actually flushing.

If we went that direction, we could probably simplify the code a bit
(both getting rid of the batch_write function I call here, and dropping
a bunch of existing fflush() calls, where we must flush any time we use
printf for its formatting capabilities).

I also considered that the "--buffer" case is likely to be the common
one. We cannot flip the default, though, as it would break any existing
callers (who would need to specify "--no-buffer"). We can do the usual
deprecation dance, but I don't know if it is worth it for a plumbing
command like this.

 Documentation/git-cat-file.txt |  7 +++++++
 builtin/cat-file.c             | 26 +++++++++++++++++++-------
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 319ab4c..0058bd4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -69,6 +69,13 @@ OPTIONS
 	not be combined with any other options or arguments.  See the
 	section `BATCH OUTPUT` below for details.
 
+--buffer::
+	Normally batch output is flushed after each object is output, so
+	that a process can interactively read and write from
+	`cat-file`. With this option, the output uses normal stdio
+	buffering; this is much more efficient when invoking
+	`--batch-check` on a large number of objects.
+
 --allow-unknown-type::
 	Allow -s or -t to query broken/corrupt objects of unknown type.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index d4101b7..741e100 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@ struct batch_options {
 	int enabled;
 	int follow_symlinks;
 	int print_contents;
+	int buffer_output;
 	const char *format;
 };
 
@@ -211,14 +212,25 @@ static size_t expand_format(struct strbuf *sb, const char *start, void *data)
 	return end - start + 1;
 }
 
-static void print_object_or_die(int fd, struct expand_data *data)
+static void batch_write(struct batch_options *opt, const void *data, int len)
+{
+	if (opt->buffer_output) {
+		if (fwrite(data, 1, len, stdout) != len)
+			die_errno("unable to write to stdout");
+	} else
+		write_or_die(1, data, len);
+}
+
+static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
 {
 	const unsigned char *sha1 = data->sha1;
 
 	assert(data->info.typep);
 
 	if (data->type == OBJ_BLOB) {
-		if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
+		if (opt->buffer_output)
+			fflush(stdout);
+		if (stream_blob_to_fd(1, sha1, NULL, 0) < 0)
 			die("unable to stream %s to stdout", sha1_to_hex(sha1));
 	}
 	else {
@@ -234,12 +246,11 @@ static void print_object_or_die(int fd, struct expand_data *data)
 		if (data->info.sizep && size != data->size)
 			die("object %s changed size!?", sha1_to_hex(sha1));
 
-		write_or_die(fd, contents, size);
+		batch_write(opt, contents, size);
 		free(contents);
 	}
 }
 
-
 static int batch_one_object(const char *obj_name, struct batch_options *opt,
 			    struct expand_data *data)
 {
@@ -294,12 +305,12 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt,
 
 	strbuf_expand(&buf, opt->format, expand_format, data);
 	strbuf_addch(&buf, '\n');
-	write_or_die(1, buf.buf, buf.len);
+	batch_write(opt, buf.buf, buf.len);
 	strbuf_release(&buf);
 
 	if (opt->print_contents) {
-		print_object_or_die(1, data);
-		write_or_die(1, "\n", 1);
+		print_object_or_die(opt, data);
+		batch_write(opt, "\n", 1);
 	}
 	return 0;
 }
@@ -415,6 +426,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			    N_("for blob objects, run textconv on object's content"), 'c'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
 		{ OPTION_CALLBACK, 0, "batch", &batch, "format",
 			N_("show info and content of objects fed from the standard input"),
 			PARSE_OPT_OPTARG, batch_option_callback },
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/7] cat-file: stop returning value from batch_one_object
  2015-06-22 10:33       ` Jeff King
                           ` (3 preceding siblings ...)
  2015-06-22 10:45         ` [PATCH 4/7] cat-file: add --buffer option Jeff King
@ 2015-06-22 10:45         ` Jeff King
  2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
                           ` (4 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:45 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

If batch_one_object returns an error code, we stop reading
input.  However, it will only do so if we feed it NULL,
which cannot happen; we give it the "buf" member of a
strbuf, which is always non-NULL.

We did originally stop on other errors (like a missing
object), but this was changed in 3c076db (cat-file --batch /
--batch-check: do not exit if hashes are missing,
2008-06-09). These days we keep going for any per-object
error (and print "missing" when necessary).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/cat-file.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 741e100..7d99c15 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -251,17 +251,14 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	}
 }
 
-static int batch_one_object(const char *obj_name, struct batch_options *opt,
-			    struct expand_data *data)
+static void batch_one_object(const char *obj_name, struct batch_options *opt,
+			     struct expand_data *data)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct object_context ctx;
 	int flags = opt->follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0;
 	enum follow_symlinks_result result;
 
-	if (!obj_name)
-	   return 1;
-
 	result = get_sha1_with_context(obj_name, flags, data->sha1, &ctx);
 	if (result != FOUND) {
 		switch (result) {
@@ -286,7 +283,7 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt,
 			break;
 		}
 		fflush(stdout);
-		return 0;
+		return;
 	}
 
 	if (ctx.mode == 0) {
@@ -294,13 +291,13 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt,
 		       (uintmax_t)ctx.symlink_path.len,
 		       ctx.symlink_path.buf);
 		fflush(stdout);
-		return 0;
+		return;
 	}
 
 	if (sha1_object_info_extended(data->sha1, &data->info, LOOKUP_REPLACE_OBJECT) < 0) {
 		printf("%s missing\n", obj_name);
 		fflush(stdout);
-		return 0;
+		return;
 	}
 
 	strbuf_expand(&buf, opt->format, expand_format, data);
@@ -312,7 +309,6 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt,
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
-	return 0;
 }
 
 static int batch_objects(struct batch_options *opt)
@@ -367,9 +363,7 @@ static int batch_objects(struct batch_options *opt)
 			data.rest = p;
 		}
 
-		retval = batch_one_object(buf.buf, opt, &data);
-		if (retval)
-			break;
+		batch_one_object(buf.buf, opt, &data);
 	}
 
 	strbuf_release(&buf);
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/7] cat-file: split batch_one_object into two stages
  2015-06-22 10:33       ` Jeff King
                           ` (4 preceding siblings ...)
  2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
@ 2015-06-22 10:45         ` Jeff King
  2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
                           ` (3 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:45 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

There are really two things going on in this function:

  1. We convert the name we got on stdin to a sha1.

  2. We look up and print information on the sha1.

Let's split out the second half so that we can call it
separately.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/cat-file.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7d99c15..499ccda 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -251,10 +251,31 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	}
 }
 
+static void batch_object_write(const char *obj_name, struct batch_options *opt,
+			       struct expand_data *data)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	if (sha1_object_info_extended(data->sha1, &data->info, LOOKUP_REPLACE_OBJECT) < 0) {
+		printf("%s missing\n", obj_name);
+		fflush(stdout);
+		return;
+	}
+
+	strbuf_expand(&buf, opt->format, expand_format, data);
+	strbuf_addch(&buf, '\n');
+	batch_write(opt, buf.buf, buf.len);
+	strbuf_release(&buf);
+
+	if (opt->print_contents) {
+		print_object_or_die(opt, data);
+		batch_write(opt, "\n", 1);
+	}
+}
+
 static void batch_one_object(const char *obj_name, struct batch_options *opt,
 			     struct expand_data *data)
 {
-	struct strbuf buf = STRBUF_INIT;
 	struct object_context ctx;
 	int flags = opt->follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0;
 	enum follow_symlinks_result result;
@@ -294,21 +315,7 @@ static void batch_one_object(const char *obj_name, struct batch_options *opt,
 		return;
 	}
 
-	if (sha1_object_info_extended(data->sha1, &data->info, LOOKUP_REPLACE_OBJECT) < 0) {
-		printf("%s missing\n", obj_name);
-		fflush(stdout);
-		return;
-	}
-
-	strbuf_expand(&buf, opt->format, expand_format, data);
-	strbuf_addch(&buf, '\n');
-	batch_write(opt, buf.buf, buf.len);
-	strbuf_release(&buf);
-
-	if (opt->print_contents) {
-		print_object_or_die(opt, data);
-		batch_write(opt, "\n", 1);
-	}
+	batch_object_write(obj_name, opt, data);
 }
 
 static int batch_objects(struct batch_options *opt)
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 7/7] cat-file: add --batch-all-objects option
  2015-06-22 10:33       ` Jeff King
                           ` (5 preceding siblings ...)
  2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
@ 2015-06-22 10:45         ` Jeff King
  2015-06-26  6:56           ` Eric Sunshine
  2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
                           ` (2 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2015-06-22 10:45 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

It can sometimes be useful to examine all objects in the
repository. Normally this is done with "git rev-list --all
--objects", but:

  1. That shows only reachable objects. You may want to look
     at all available objects.

  2. It's slow. We actually open each object to walk the
     graph. If your operation is OK with seeing unreachable
     objects, it's an order of magnitude faster to just
     enumerate the loose directories and pack indices.

You can do this yourself using "ls" and "git show-index",
but it's non-obvious.  This patch adds an option to
"cat-file --batch-check" to operate on all available
objects (rather than reading names from stdin).

This is based on a proposal by Charles Bailey to provide a
separate "git list-all-objects" command. That is more
orthogonal, as it splits enumerating the objects from
getting information about them. However, in practice you
will either:

  a. Feed the list of objects directly into cat-file anyway,
     so you can find out information about them. Keeping it
     in a single process is more efficient.

  b. Ask the listing process to start telling you more
     information about the objects, in which case you will
     reinvent cat-file's batch-check formatter.

Adding a cat-file option is simple and efficient. And if you
really do want just the object names, you can always do:

  git cat-file --batch-check='%(objectname)' --batch-all-objects

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-cat-file.txt |  8 ++++++++
 builtin/cat-file.c             | 44 ++++++++++++++++++++++++++++++++++++++++--
 t/t1006-cat-file.sh            | 27 ++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 0058bd4..6831b08 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -69,6 +69,14 @@ OPTIONS
 	not be combined with any other options or arguments.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-all-objects::
+	Instead of reading a list of objects on stdin, perform the
+	requested batch operation on all objects in the repository and
+	any alternate object stores (not just reachable objects).
+	Requires `--batch` or `--batch-check` be specified. Note that
+	the order of the objects is unspecified, and there may be
+	duplicate entries.
+
 --buffer::
 	Normally batch output is flushed after each object is output, so
 	that a process can interactively read and write from
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 499ccda..95604c4 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -15,6 +15,7 @@ struct batch_options {
 	int follow_symlinks;
 	int print_contents;
 	int buffer_output;
+	int all_objects;
 	const char *format;
 };
 
@@ -257,7 +258,7 @@ static void batch_object_write(const char *obj_name, struct batch_options *opt,
 	struct strbuf buf = STRBUF_INIT;
 
 	if (sha1_object_info_extended(data->sha1, &data->info, LOOKUP_REPLACE_OBJECT) < 0) {
-		printf("%s missing\n", obj_name);
+		printf("%s missing\n", obj_name ? obj_name : sha1_to_hex(data->sha1));
 		fflush(stdout);
 		return;
 	}
@@ -318,6 +319,34 @@ static void batch_one_object(const char *obj_name, struct batch_options *opt,
 	batch_object_write(obj_name, opt, data);
 }
 
+struct object_cb_data {
+	struct batch_options *opt;
+	struct expand_data *expand;
+};
+
+static int batch_object_cb(const unsigned char *sha1,
+			   struct object_cb_data *data)
+{
+	hashcpy(data->expand->sha1, sha1);
+	batch_object_write(NULL, data->opt, data->expand);
+	return 0;
+}
+
+static int batch_loose_object(const unsigned char *sha1,
+			      const char *path,
+			      void *data)
+{
+	return batch_object_cb(sha1, data);
+}
+
+static int batch_packed_object(const unsigned char *sha1,
+			       struct packed_git *pack,
+			       uint32_t pos,
+			       void *data)
+{
+	return batch_object_cb(sha1, data);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -345,6 +374,15 @@ static int batch_objects(struct batch_options *opt)
 	if (opt->print_contents)
 		data.info.typep = &data.type;
 
+	if (opt->all_objects) {
+		struct object_cb_data cb;
+		cb.opt = opt;
+		cb.expand = &data;
+		for_each_loose_object(batch_loose_object, &cb, 0);
+		for_each_packed_object(batch_packed_object, &cb, 0);
+		return 0;
+	}
+
 	/*
 	 * We are going to call get_sha1 on a potentially very large number of
 	 * objects. In most large cases, these will be actual object sha1s. The
@@ -436,6 +474,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			PARSE_OPT_OPTARG, batch_option_callback },
 		OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks,
 			 N_("follow in-tree symlinks (used with --batch or --batch-check)")),
+		OPT_BOOL(0, "batch-all-objects", &batch.all_objects,
+			 N_("show all objects with --batch or --batch-check")),
 		OPT_END()
 	};
 
@@ -460,7 +500,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		usage_with_options(cat_file_usage, options);
 	}
 
-	if (batch.follow_symlinks && !batch.enabled) {
+	if ((batch.follow_symlinks || batch.all_objects) && !batch.enabled) {
 		usage_with_options(cat_file_usage, options);
 	}
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 93a4794..2b4220a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -547,4 +547,31 @@ test_expect_success 'git cat-file --batch --follow-symlink returns correct sha a
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file --batch-all-objects shows all objects' '
+	# make new repos so we now the full set of objects; we will
+	# also make sure that there are some packed and some loose
+	# objects, some referenced and some not, and that there are
+	# some available only via alternates.
+	git init all-one &&
+	(
+		cd all-one &&
+		echo content >file &&
+		git add file &&
+		git commit -qm base &&
+		git rev-parse HEAD HEAD^{tree} HEAD:file &&
+		git repack -ad &&
+		echo not-cloned | git hash-object -w --stdin
+	) >expect.unsorted &&
+	git clone -s all-one all-two &&
+	(
+		cd all-two &&
+		echo local-unref | git hash-object -w --stdin
+	) >>expect.unsorted &&
+	sort <expect.unsorted >expect &&
+	git -C all-two cat-file --batch-all-objects \
+				--batch-check="%(objectname)" >actual.unsorted &&
+	sort <actual.unsorted >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects
  2015-06-22 10:33       ` Jeff King
                           ` (6 preceding siblings ...)
  2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
@ 2015-06-22 11:06         ` Jeff King
  2015-06-22 22:03           ` Charles Bailey
  2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
  2015-06-22 21:50         ` Junio C Hamano
  9 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2015-06-22 11:06 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote:

> By the way, in addition to not showing objects in order,
> list-all-objects (and my cat-file option) may show duplicates. Do we
> want to "sort -u" for the user? It might be nice for them to always get
> a de-duped and sorted list. Aside from the CPU cost of sorting, it does
> mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
> that's not too much when you are talking about the kernel repo. I took
> the coward's way out and just mentioned the limitation in the
> documentation, but I'm happy to be persuaded.

The patch below does the sort/de-dup. I'd probably just squash it into
patch 7, though.

I did have one additional thought, though. We are treating this as two
separate operations: "what are the sha1s in the repo" and "show me
information about this sha1". But by integrating with cat-file, we could
actually show information not just about a particular sha1, but about a
particular on-disk object.

E.g., if there are duplicates of a particular object, some formatters
like "%(objectsize:disk)" and "%(deltabase)" pick one arbitrarily to
show. I don't know if anybody actually cares about that in practice, but
if we show duplicates, we could give the accurate information for each
instance (and in fact we could give other information like loose vs
packed, which file contains the object, etc).

I tend to think that the lack of de-duping is sufficiently confusing
that it should be the default, and we can always add a "no really, show
me the duplicates" option later. It is not as simple as skipping the
de-dup step. We'd have to actually avoid calling sha1_object_info, and
use the information found in the loose/pack traversal (which would in
turn require exposing the low-level bits of sha1_object_info).

-- >8 --
Subject: cat-file: sort and de-dup output of --batch-all-objects

The sorting we could probably live without, but printing
duplicates is just a hassle for the user, who must then
de-dup themselves (or risk a wrong answer if they are doing
something like counting objects with a particular property).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-cat-file.txt |  3 +--
 builtin/cat-file.c             | 22 +++++++++++++++-------
 t/t1006-cat-file.sh            |  3 +--
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 6831b08..3105fc0 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -74,8 +74,7 @@ OPTIONS
 	requested batch operation on all objects in the repository and
 	any alternate object stores (not just reachable objects).
 	Requires `--batch` or `--batch-check` be specified. Note that
-	the order of the objects is unspecified, and there may be
-	duplicate entries.
+	the objects are visited in order sorted by their hashes.
 
 --buffer::
 	Normally batch output is flushed after each object is output, so
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 95604c4..07baad1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -9,6 +9,7 @@
 #include "userdiff.h"
 #include "streaming.h"
 #include "tree-walk.h"
+#include "sha1-array.h"
 
 struct batch_options {
 	int enabled;
@@ -324,19 +325,19 @@ struct object_cb_data {
 	struct expand_data *expand;
 };
 
-static int batch_object_cb(const unsigned char *sha1,
-			   struct object_cb_data *data)
+static void batch_object_cb(const unsigned char sha1[20], void *vdata)
 {
+	struct object_cb_data *data = vdata;
 	hashcpy(data->expand->sha1, sha1);
 	batch_object_write(NULL, data->opt, data->expand);
-	return 0;
 }
 
 static int batch_loose_object(const unsigned char *sha1,
 			      const char *path,
 			      void *data)
 {
-	return batch_object_cb(sha1, data);
+	sha1_array_append(data, sha1);
+	return 0;
 }
 
 static int batch_packed_object(const unsigned char *sha1,
@@ -344,7 +345,8 @@ static int batch_packed_object(const unsigned char *sha1,
 			       uint32_t pos,
 			       void *data)
 {
-	return batch_object_cb(sha1, data);
+	sha1_array_append(data, sha1);
+	return 0;
 }
 
 static int batch_objects(struct batch_options *opt)
@@ -375,11 +377,17 @@ static int batch_objects(struct batch_options *opt)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
+		struct sha1_array sa = SHA1_ARRAY_INIT;
 		struct object_cb_data cb;
+
+		for_each_loose_object(batch_loose_object, &sa, 0);
+		for_each_packed_object(batch_packed_object, &sa, 0);
+
 		cb.opt = opt;
 		cb.expand = &data;
-		for_each_loose_object(batch_loose_object, &cb, 0);
-		for_each_packed_object(batch_packed_object, &cb, 0);
+		sha1_array_for_each_unique(&sa, batch_object_cb, &cb);
+
+		sha1_array_clear(&sa);
 		return 0;
 	}
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2b4220a..18dbdc8 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -569,8 +569,7 @@ test_expect_success 'cat-file --batch-all-objects shows all objects' '
 	) >>expect.unsorted &&
 	sort <expect.unsorted >expect &&
 	git -C all-two cat-file --batch-all-objects \
-				--batch-check="%(objectname)" >actual.unsorted &&
-	sort <actual.unsorted >actual &&
+				--batch-check="%(objectname)" >actual &&
 	test_cmp expect actual
 '
 
-- 
2.4.4.719.g3984bc6

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-22  8:38     ` Jeff King
  2015-06-22 10:33       ` Jeff King
@ 2015-06-22 11:38       ` Charles Bailey
  1 sibling, 0 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-22 11:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:
> On Sun, Jun 21, 2015 at 08:20:31PM +0100, Charles Bailey wrote:
> 
> > +	prepare_packed_git();
> > +	for (p = packed_git; p; p = p->next) {
> > +		open_pack_index(p);
> > +	}
> 
> Yikes. The fact that you need to do this means that
> for_each_packed_object is buggy, IMHO. I'll send a patch.

I'm glad you said that; the interface did seem a bit warty at the time
but as I "fixed" this early in my hacking I didn't remeber to revisit
this and ask if it was actually intentional.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Fast enumeration of objects
  2015-06-22  8:35   ` Fast enumeration of objects Jeff King
@ 2015-06-22 19:44     ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-22 19:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> ...
> So my conclusions are:
>
>   1. Yes, the pipe/parsing overhead of a separate processor really is
>      measurable. That's hidden in the wall-clock time if you have
>      multiple cores, but you may care more about CPU time. I still think
>      the flexibility is worth it.
>
>   2. Cutting out the pipe to cat-file is worth doing, as it saves a few
>      seconds. Cutting out "%(objecttype)" saves a lot, too, and is worth
>      doing. We should teach "list-all-objects -v" to use cat-file's
>      custom formatters (alternatively, we could just teach cat-file a
>      "--batch-all-objects" option rather than add a new command).
>
>   3. We should teach cat-file a "--buffer" option to use fwrite. Even if
>      we end up with "list-all-objects --format='%(objectsize)'" for this
>      task, it would help all the other uses of cat-file.

All sounds very sensible.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-22 10:33       ` Jeff King
                           ` (7 preceding siblings ...)
  2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
@ 2015-06-22 21:48         ` Charles Bailey
  2015-06-22 21:50         ` Junio C Hamano
  9 siblings, 0 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-22 21:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote:
> On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:
> 
> > > +	prepare_packed_git();
> > > +	for (p = packed_git; p; p = p->next) {
> > > +		open_pack_index(p);
> > > +	}
> > 
> > Yikes. The fact that you need to do this means that
> > for_each_packed_object is buggy, IMHO. I'll send a patch.
> 
> Here's that patch. And since I did not want to pile work on Charles, I
> went ahead and just implemented the patches I suggested in the other
> email.

I have to say that I think that adding this functionality to cat-file
makes a lot of sense. If it only catted files it might be a stretch but
as it's already grown --batch-check functionality, it now seems a
reasonable extension. I'm not particularly attached to having a
standalone "list-all-objects" command per se.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-22 10:33       ` Jeff King
                           ` (8 preceding siblings ...)
  2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
@ 2015-06-22 21:50         ` Junio C Hamano
  2015-06-22 23:50           ` Jeff King
  9 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2015-06-22 21:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, git

Jeff King <peff@peff.net> writes:

> On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:
>
>> > +	prepare_packed_git();
>> > +	for (p = packed_git; p; p = p->next) {
>> > +		open_pack_index(p);
>> > +	}
>> 
>> Yikes. The fact that you need to do this means that
>> for_each_packed_object is buggy, IMHO. I'll send a patch.
>
> Here's that patch. And since I did not want to pile work on Charles, I
> went ahead and just implemented the patches I suggested in the other
> email.
>
> We may want to take patch 1 separately for the maint-track, as it is
> really a bug-fix (albeit one that I do not think actually affects anyone
> in practice right now).

Hmph, add_unseen_recent_objects_to_traversal() is the only existing
user, and before d3038d22 (prune: keep objects reachable from recent
objects, 2014-10-15) added that function, for-each-packed-object
existed but had no callers.

And the objects not beeing seen by that function (due to lack of
"open") would matter only for pruning purposes, which would mean
you have to be calling into the codepath when running a full repack,
so you would have opened all the packs that matter anyway (if you
have a "old cruft archive" pack that contains only objects that
are unreachable, you may not have opened that pack, though, and you
may prune the thing away prematurely).

So, I think I can agree that this would unlikely affect anybody in
practice.

> Patches 2-5 are useful even if we go with Charles' command, as they make
> cat-file better (cleanups and he new buffer option).
>
> Patches 6-7 implement the cat-file option that would be redundant with
> list-all-objects.
>
> By the way, in addition to not showing objects in order,
> list-all-objects (and my cat-file option) may show duplicates. Do we
> want to "sort -u" for the user? It might be nice for them to always get
> a de-duped and sorted list. Aside from the CPU cost of sorting, it does
> mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
> that's not too much when you are talking about the kernel repo. I took
> the coward's way out and just mentioned the limitation in the
> documentation, but I'm happy to be persuaded.
>
>   [1/7]: for_each_packed_object: automatically open pack index
>   [2/7]: cat-file: minor style fix in options list
>   [3/7]: cat-file: move batch_options definition to top of file
>   [4/7]: cat-file: add --buffer option
>   [5/7]: cat-file: stop returning value from batch_one_object
>   [6/7]: cat-file: split batch_one_object into two stages
>   [7/7]: cat-file: add --batch-all-objects option
>
> -Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
  2015-06-21 18:30     ` Charles Bailey
@ 2015-06-22 22:03       ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-22 22:03 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> On Sun, Jun 21, 2015 at 07:25:44PM +0100, Charles Bailey wrote:
>> From: Charles Bailey <cbailey32@bloomberg.net>
>> 
>> diff --git a/parse-options.c b/parse-options.c
>> index 80106c0..101b649 100644
>> --- a/parse-options.c
>> +++ b/parse-options.c
>> @@ -180,6 +180,23 @@ static int get_value(struct parse_opt_ctx_t *p,
>>  			return opterror(opt, "expects a numerical value", flags);
>>  		return 0;
>>  
>> +	case OPTION_MAGNITUDE:
>> +		if (unset) {
>> +			*(unsigned long *)opt->value = 0;
>> +			return 0;
>> +		}
>> +		if (opt->flags & PARSE_OPT_OPTARG && !p->opt) {
>> +			*(unsigned long *)opt->value = opt->defval;
>> +			return 0;
>> +		}
>> +		if (get_arg(p, opt, flags, &arg))
>> +			return -1;
>> +		if (!git_parse_ulong(arg, opt->value))
>> +			return opterror(opt,
>> +				"expects a integer value with an optional k/m/g suffix",
>> +				flags);
>> +		return 0;
>> +
>
> Spotted after sending:
> s/expects a integer/expects an integer/

Thanks.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects
  2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
@ 2015-06-22 22:03           ` Charles Bailey
  2015-06-22 23:46             ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Charles Bailey @ 2015-06-22 22:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Mon, Jun 22, 2015 at 07:06:32AM -0400, Jeff King wrote:
> On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote:
> 
> > By the way, in addition to not showing objects in order,
> > list-all-objects (and my cat-file option) may show duplicates. Do we
> > want to "sort -u" for the user? It might be nice for them to always get
> > a de-duped and sorted list. Aside from the CPU cost of sorting, it does
> > mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
> > that's not too much when you are talking about the kernel repo. I took
> > the coward's way out and just mentioned the limitation in the
> > documentation, but I'm happy to be persuaded.
> 
> The patch below does the sort/de-dup. I'd probably just squash it into
> patch 7, though.

Woah, 8 out of 7! Did you get a chance to measure the performance hit of
the sort? If not, I may test it out when I next get the chance.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
  2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
  2015-06-21 18:30     ` Charles Bailey
@ 2015-06-22 22:08     ` Junio C Hamano
  1 sibling, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2015-06-22 22:08 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> From: Charles Bailey <cbailey32@bloomberg.net>
>
> The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing)
> is more widely applicable. Add support for OPT_MAGNITUDE to
> parse-options.h and change pack-objects.c use this support.
>
> The error behavior on parse errors follows that of OPT_INTEGER.
> The name of the option that failed to parse is reported with a brief
> message describing the expect format for the option argument and then
> the full usage message for the command invoked.
>
> This is differs from the previous behavior for OPT_ULONG used in

s/is //; (locally fixed--no need to resend).

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Improvements to integer option parsing
  2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
  2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
  2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
@ 2015-06-22 22:09   ` Junio C Hamano
  2015-06-22 22:42     ` Charles Bailey
  2 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2015-06-22 22:09 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> This is a re-roll of the first two patches in my previous series which used to
> include "filter-objects" which is now a separate topic.
>
> [PATCH 1/2] Correct test-parse-options to handle negative ints
>
> The first one has changed only in that I've moved the additional test to a more
> logical place in the test file.
>
> [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
>
> I've made the following changes to the second commit:
>
> - renamed this OPT_MAGNITUDE to try and convey something that is
> both unsigned and might benefit from a 'scale' suffix. I'm expecting
> more discussion on the name!

I think that name is very sensible.

> - marginally improved the opterror message on failed parses

I'd queue with "s/a integer/a non-negative integer/".

> - noted the change in behavior for the error messages generated for
> pack-objects' --max-pack-size and --window-memory in the commit message

Thanks.  Queued.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Improvements to integer option parsing
  2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
@ 2015-06-22 22:42     ` Charles Bailey
  0 siblings, 0 replies; 51+ messages in thread
From: Charles Bailey @ 2015-06-22 22:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


> On 22 Jun 2015, at 23:09, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Charles Bailey <charles@hashpling.org> writes:
>> 
>> - marginally improved the opterror message on failed parses
> 
> I'd queue with "s/a integer/a non-negative integer/".

Ha! That's what I had before I submitted, but then the source line got quite long (which could have been split) and the generated message got quite long as well so I cropped it. This was probably the source of the grammar mistake.

If you're happy with the longer message, I am happy with it too.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects
  2015-06-22 22:03           ` Charles Bailey
@ 2015-06-22 23:46             ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 23:46 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, git

On Mon, Jun 22, 2015 at 11:03:50PM +0100, Charles Bailey wrote:

> > The patch below does the sort/de-dup. I'd probably just squash it into
> > patch 7, though.
> 
> Woah, 8 out of 7! Did you get a chance to measure the performance hit of
> the sort? If not, I may test it out when I next get the chance.

No, that last patch was my "eh, one more thing before bed" patch. ;)

It's easy enough to time, though. Running:

  git cat-file --batch-all-objects \
               --batch-check='%(objectsize) %(objectname)' \
	       --buffer >/dev/null

on linux.git, my best-of-five goes from (no sorting):

  real    0m3.604s
  user    0m3.556s
  sys     0m0.048s

to (with sorting):

  real    0m4.053s
  user    0m4.004s
  sys     0m0.052s

So it does matter, but not too much. We could de-dup with a hash table,
which might be a little faster, but I doubt it would make much
difference.  It's also mostly in sorted order already; it's possible
that a merge sort would behave a little better. I'm not sure how deep
it's worth going into that rabbit hole.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] Add list-all-objects command
  2015-06-22 21:50         ` Junio C Hamano
@ 2015-06-22 23:50           ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-22 23:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, git

On Mon, Jun 22, 2015 at 02:50:10PM -0700, Junio C Hamano wrote:

> > We may want to take patch 1 separately for the maint-track, as it is
> > really a bug-fix (albeit one that I do not think actually affects anyone
> > in practice right now).
> 
> Hmph, add_unseen_recent_objects_to_traversal() is the only existing
> user, and before d3038d22 (prune: keep objects reachable from recent
> objects, 2014-10-15) added that function, for-each-packed-object
> existed but had no callers.

I think that is because it was added by d3038d22^. :)

> And the objects not beeing seen by that function (due to lack of
> "open") would matter only for pruning purposes, which would mean
> you have to be calling into the codepath when running a full repack,
> so you would have opened all the packs that matter anyway (if you
> have a "old cruft archive" pack that contains only objects that
> are unreachable, you may not have opened that pack, though, and you
> may prune the thing away prematurely).

Exactly, that matches my analysis.

> So, I think I can agree that this would unlikely affect anybody in
> practice.

Yep. I am OK if we do not even worry about it for "maint", then.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] cat-file: add --batch-all-objects option
  2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
@ 2015-06-26  6:56           ` Eric Sunshine
  2015-06-26 15:48             ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Sunshine @ 2015-06-26  6:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, Junio C Hamano, Git List

On Mon, Jun 22, 2015 at 6:45 AM, Jeff King <peff@peff.net> wrote:
> [...] This patch adds an option to
> "cat-file --batch-check" to operate on all available
> objects (rather than reading names from stdin).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 93a4794..2b4220a 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -547,4 +547,31 @@ test_expect_success 'git cat-file --batch --follow-symlink returns correct sha a
>         test_cmp expect actual
>  '
>
> +test_expect_success 'cat-file --batch-all-objects shows all objects' '
> +       # make new repos so we now the full set of objects; we will

s/now/know/

> +       # also make sure that there are some packed and some loose
> +       # objects, some referenced and some not, and that there are
> +       # some available only via alternates.
> +       git init all-one &&
> +       (
> +               cd all-one &&
> +               echo content >file &&
> +               git add file &&
> +               git commit -qm base &&
> +               git rev-parse HEAD HEAD^{tree} HEAD:file &&
> +               git repack -ad &&
> +               echo not-cloned | git hash-object -w --stdin
> +       ) >expect.unsorted &&
> +       git clone -s all-one all-two &&
> +       (
> +               cd all-two &&
> +               echo local-unref | git hash-object -w --stdin
> +       ) >>expect.unsorted &&
> +       sort <expect.unsorted >expect &&
> +       git -C all-two cat-file --batch-all-objects \
> +                               --batch-check="%(objectname)" >actual.unsorted &&
> +       sort <actual.unsorted >actual &&
> +       test_cmp expect actual
> +'
> +
>  test_done
> --
> 2.4.4.719.g3984bc6

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] cat-file: add --batch-all-objects option
  2015-06-26  6:56           ` Eric Sunshine
@ 2015-06-26 15:48             ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2015-06-26 15:48 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Charles Bailey, Junio C Hamano, Git List

On Fri, Jun 26, 2015 at 02:56:58AM -0400, Eric Sunshine wrote:

> > +test_expect_success 'cat-file --batch-all-objects shows all objects' '
> > +       # make new repos so we now the full set of objects; we will
> 
> s/now/know/

Yeah. I don't think this series otherwise needs re-rolled. Here it is in
an autosquash-able form:

-- >8 --
Subject: [PATCH] fixup! cat-file: add --batch-all-objects option

---
 t/t1006-cat-file.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18dbdc8..4f38078 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -548,7 +548,7 @@ test_expect_success 'git cat-file --batch --follow-symlink returns correct sha a
 '
 
 test_expect_success 'cat-file --batch-all-objects shows all objects' '
-	# make new repos so we now the full set of objects; we will
+	# make new repos so we know the full set of objects; we will
 	# also make sure that there are some packed and some loose
 	# objects, some referenced and some not, and that there are
 	# some available only via alternates.
-- 
2.5.0.rc0.336.g8460790

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-06-26 15:48 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-19 18:28   ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-19 11:03   ` Remi Galan Alfonso
2015-06-19 11:06     ` Charles Bailey
2015-06-19 17:58   ` Junio C Hamano
2015-06-19 18:39     ` Junio C Hamano
2015-06-20 15:31       ` Jakub Narębski
2015-06-19 18:47     ` Jakub Narębski
2015-06-20 16:51     ` Charles Bailey
2015-06-20 17:47       ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
2015-06-19 10:10   ` Jeff King
2015-06-19 10:33     ` Charles Bailey
2015-06-19 10:52       ` Jeff King
2015-06-19 18:28         ` Junio C Hamano
2015-06-19 10:52       ` John Keeping
2015-06-19 11:04         ` Charles Bailey
2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-21 18:30     ` Charles Bailey
2015-06-22 22:03       ` Junio C Hamano
2015-06-22 22:08     ` Junio C Hamano
2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
2015-06-22 22:42     ` Charles Bailey
2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22  8:38     ` Jeff King
2015-06-22 10:33       ` Jeff King
2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
2015-06-22 10:45         ` [PATCH 4/7] cat-file: add --buffer option Jeff King
2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
2015-06-26  6:56           ` Eric Sunshine
2015-06-26 15:48             ` Jeff King
2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
2015-06-22 22:03           ` Charles Bailey
2015-06-22 23:46             ` Jeff King
2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 21:50         ` Junio C Hamano
2015-06-22 23:50           ` Jeff King
2015-06-22 11:38       ` Charles Bailey
2015-06-22  9:57     ` Duy Nguyen
2015-06-22 10:24       ` Jeff King
2015-06-22  8:35   ` Fast enumeration of objects Jeff King
2015-06-22 19:44     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.