All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] git-count-objects.txt: describe each line in -v output
@ 2013-02-04 12:49 Nguyễn Thái Ngọc Duy
  2013-02-04 12:49 ` [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too Nguyễn Thái Ngọc Duy
  2013-02-08  3:48 ` [PATCH v2 0/3] count-objects improvements Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 14+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-02-04 12:49 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

The current description requires a bit of guessing (what clause
corresponds to what printed line?) and lacks information, such as
the unit of size and size-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-count-objects.txt | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 23c80ce..e816823 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -20,11 +20,21 @@ OPTIONS
 -------
 -v::
 --verbose::
-	In addition to the number of loose objects and disk
-	space consumed, it reports the number of in-pack
-	objects, number of packs, disk space consumed by those packs,
-	and number of objects that can be removed by running
-	`git prune-packed`.
+	Report in more detail:
++
+count: the number of loose objects
++
+size: disk space consumed by loose objects, in KiB
++
+in-pack: the number of in-pack objects
++
+size-pack: disk space consumed by the packs, in KiB
++
+prune-packable: the number of loose objects that are also present in
+the packs. These objects could be pruned using `git prune-packed`.
++
+garbage: the number of files in loose object database that are not
+valid loose objects
 
 GIT
 ---
-- 
1.8.1.2.536.gf441e6d

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too
  2013-02-04 12:49 [PATCH 1/2] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
@ 2013-02-04 12:49 ` Nguyễn Thái Ngọc Duy
  2013-02-04 17:06   ` Junio C Hamano
  2013-02-04 18:16   ` Junio C Hamano
  2013-02-08  3:48 ` [PATCH v2 0/3] count-objects improvements Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 14+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-02-04 12:49 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

While it's unusual to have strange files in loose object database,
.git/objects/pack/tmp_* is normal after a broken fetch and they
can eat up a lot of disk space if the user does not pay attention.
Report them.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 The hook in prepare_packed_git_one is ugly, but I don't want to
 duplicate the search file logic there in count-objects. Maybe I'm
 wrong.

 Interestingly it reports .commits file in my repo too. A nice
 reminder to myself to remind Jeff about count-objects improvements
 for his commit-cache work :)

 Way may also need a more friendly format than this one, which I
 assume is plumbing. Something that average git user can understand
 without looking up the document. If "git stats" is too much for this
 purpose, perhaps "git gc --stats"?

 Documentation/git-count-objects.txt |  4 ++--
 builtin/count-objects.c             | 27 ++++++++++++++++++++++++++-
 sha1_file.c                         | 23 ++++++++++++++++++++---
 3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index e816823..1611d7c 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -33,8 +33,8 @@ size-pack: disk space consumed by the packs, in KiB
 prune-packable: the number of loose objects that are also present in
 the packs. These objects could be pruned using `git prune-packed`.
 +
-garbage: the number of files in loose object database that are not
-valid loose objects
+garbage: the number of files in object database that are not valid
+loose objects nor valid packs
 
 GIT
 ---
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 9afaa88..e8fabcf 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -9,6 +9,8 @@
 #include "builtin.h"
 #include "parse-options.h"
 
+static unsigned long garbage;
+
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
 			  off_t *loose_size,
@@ -65,6 +67,27 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 	}
 }
 
+extern void (*report_pack_garbage)(const char *path, int len, const char *name);
+static void real_report_pack_garbage(const char *path, int len, const char *name)
+{
+	if (is_dot_or_dotdot(name))
+		return;
+	if (has_extension(name, ".pack")) {
+		struct strbuf idx_file = STRBUF_INIT;
+		struct stat st;
+
+		strbuf_addf(&idx_file, "%.*s/%.*s.idx", len, path,
+			    (int)strlen(name) - 5, name);
+		if (!stat(idx_file.buf, &st) && S_ISREG(st.st_mode)) {
+			strbuf_release(&idx_file);
+			return;
+		}
+		strbuf_release(&idx_file);
+	}
+	error("garbage found: %.*s/%s", len, path, name);
+	garbage++;
+}
+
 static char const * const count_objects_usage[] = {
 	N_("git count-objects [-v]"),
 	NULL
@@ -76,7 +99,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	const char *objdir = get_object_directory();
 	int len = strlen(objdir);
 	char *path = xmalloc(len + 50);
-	unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0;
+	unsigned long loose = 0, packed = 0, packed_loose = 0;
 	off_t loose_size = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, N_("be verbose")),
@@ -87,6 +110,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	/* we do not take arguments other than flags for now */
 	if (argc)
 		usage_with_options(count_objects_usage, opts);
+	if (verbose)
+		report_pack_garbage = real_report_pack_garbage;
 	memcpy(path, objdir, len);
 	if (len && objdir[len-1] != '/')
 		path[len++] = '/';
diff --git a/sha1_file.c b/sha1_file.c
index 40b2329..6045946 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1000,6 +1000,17 @@ void install_packed_git(struct packed_git *pack)
 	packed_git = pack;
 }
 
+static void dummy_report_pack_garbage(const char *path,
+				      int len,
+				      const char *name)
+{
+}
+
+void (*report_pack_garbage)(const char *path,
+			    int len,
+			    const char *name) =
+	dummy_report_pack_garbage;
+
 static void prepare_packed_git_one(char *objdir, int local)
 {
 	/* Ensure that this buffer is large enough so that we can
@@ -1024,11 +1035,15 @@ static void prepare_packed_git_one(char *objdir, int local)
 		int namelen = strlen(de->d_name);
 		struct packed_git *p;
 
-		if (!has_extension(de->d_name, ".idx"))
+		if (!has_extension(de->d_name, ".idx")) {
+			report_pack_garbage(path, len - 1, de->d_name);
 			continue;
+		}
 
-		if (len + namelen + 1 > sizeof(path))
+		if (len + namelen + 1 > sizeof(path)) {
+			report_pack_garbage(path, len - 1, de->d_name);
 			continue;
+		}
 
 		/* Don't reopen a pack we already have. */
 		strcpy(path + len, de->d_name);
@@ -1042,8 +1057,10 @@ static void prepare_packed_git_one(char *objdir, int local)
 		 * .pack file that we can map.
 		 */
 		p = add_packed_git(path, len + namelen, local);
-		if (!p)
+		if (!p) {
+			report_pack_garbage(path, len - 1, de->d_name);
 			continue;
+		}
 		install_packed_git(p);
 	}
 	closedir(dir);
-- 
1.8.1.2.536.gf441e6d

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too
  2013-02-04 12:49 ` [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too Nguyễn Thái Ngọc Duy
@ 2013-02-04 17:06   ` Junio C Hamano
  2013-02-04 18:16   ` Junio C Hamano
  1 sibling, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2013-02-04 17:06 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

>  The hook in prepare_packed_git_one is ugly, but I don't want to
>  duplicate the search file logic there in count-objects. Maybe I'm
>  wrong.

In this particular case I do not think you are completely wrong;
you are probably only two thirds wrong ;-)

The idea to use a customizable function pointer to allow it do extra
work is probably fine, but adding three identical calls and continue
is a bad taste.  Just have one callsite for the error path and do
not hesitate to jump to it with 'goto'.

I do not think failure to add_packed_git() should be treated the
same way as other cases where the files readdir() found are truly
garbage, by the way.

I also wonder, especially because you are enumerating the temporary
pack files in .git/objects/pack/, if it make more sense to account
for their sizes as well.  After all, you would end up doing stat()
for a common case of files with ".pack" suffix---doing so for all of
them shouldn't be too bad.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too
  2013-02-04 12:49 ` [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too Nguyễn Thái Ngọc Duy
  2013-02-04 17:06   ` Junio C Hamano
@ 2013-02-04 18:16   ` Junio C Hamano
  2013-02-07  7:37     ` Duy Nguyen
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2013-02-04 18:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> @@ -1024,11 +1035,15 @@ static void prepare_packed_git_one(char *objdir, int local)
>  		int namelen = strlen(de->d_name);
>  		struct packed_git *p;
>  
> -		if (!has_extension(de->d_name, ".idx"))
> +		if (!has_extension(de->d_name, ".idx")) {
> +			report_pack_garbage(path, len - 1, de->d_name);
>  			continue;
> +		}
>  
> -		if (len + namelen + 1 > sizeof(path))
> +		if (len + namelen + 1 > sizeof(path)) {
> +			report_pack_garbage(path, len - 1, de->d_name);
>  			continue;
> +		}
>  
>  		/* Don't reopen a pack we already have. */
>  		strcpy(path + len, de->d_name);
> @@ -1042,8 +1057,10 @@ static void prepare_packed_git_one(char *objdir, int local)
>  		 * .pack file that we can map.
>  		 */
>  		p = add_packed_git(path, len + namelen, local);
> -		if (!p)
> +		if (!p) {
> +			report_pack_garbage(path, len - 1, de->d_name);
>  			continue;
> +		}
>  		install_packed_git(p);
>  	}
>  	closedir(dir);

I forgot to mention one more thing.  Your report_pack_garbage()
special cases ".pack" to see if it is a regular file, but this loop
structure causes a regular file whose name ends with ".pack" but
without corresponding ".idx" file to go unreported.

I think the loop should be restructured to iterate over all known
file types and report unknown ones, if you want to repurpose it for
the reporting, something along this line, perhaps:

	for (each name) {
		if (does it end with ".idx") {
			if (is it unusable ".idx") {
				report garbage;
			}
                        continue;
		}
		if (! we are in report mode)
			continue;
		if (does it end with ".pack") {
			if (!have we seen corresponding ".idx")
				remember it;
			continue;
		}
		report garbage;
	}
	for (remembered pack) {
		if (does it have corresponding ".idx" &&
			is it really usable ".pack")
			continue;
		report garbage;
	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too
  2013-02-04 18:16   ` Junio C Hamano
@ 2013-02-07  7:37     ` Duy Nguyen
  2013-02-07 18:12       ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Duy Nguyen @ 2013-02-07  7:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Feb 04, 2013 at 10:16:23AM -0800, Junio C Hamano wrote:
> I forgot to mention one more thing.  Your report_pack_garbage()
> special cases ".pack" to see if it is a regular file, but this loop
> structure causes a regular file whose name ends with ".pack" but
> without corresponding ".idx" file to go unreported.
> 
> I think the loop should be restructured to iterate over all known
> file types and report unknown ones, if you want to repurpose it for
> the reporting, something along this line, perhaps:
> 
> 	for (each name) {
> 		if (does it end with ".idx") {
> 			if (is it unusable ".idx") {
> 				report garbage;
> 			}
>                         continue;
> 		}
> 		if (! we are in report mode)
> 			continue;
> 		if (does it end with ".pack") {
> 			if (!have we seen corresponding ".idx")
> 				remember it;
> 			continue;
> 		}
> 		report garbage;
> 	}
> 	for (remembered pack) {
> 		if (does it have corresponding ".idx" &&
> 			is it really usable ".pack")
> 			continue;
> 		report garbage;
> 	}
> 

How about the below patch? Ignoring good .commits files will be just a
couple more lines in the "for (remembered_pack)" loop.

Also in another mail in this thread:

> I also wonder, especially because you are enumerating the temporary
> pack files in .git/objects/pack/, if it make more sense to account
> for their sizes as well.  After all, you would end up doing stat()
> for a common case of files with ".pack" suffix---doing so for all of
> them shouldn't be too bad.

I thought about that, but we may need to do extra stat() for loose
garbage as well. As it is now, garbage is complained loudly, which
gives me enough motivation to clean up, even without looking at how
much disk space it uses.

-- 8< --
Subject: count-objects: report garbage pack directory too

prepare_packed_git_one() is modified to allow count-objects to hook a
report function to so we don't need to duplicate the pack searching
logic in count-objects.c. When report_pack_garbage is NULL, the
overhead is insignificant.

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index e816823..1611d7c 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -33,8 +33,8 @@ size-pack: disk space consumed by the packs, in KiB
 prune-packable: the number of loose objects that are also present in
 the packs. These objects could be pruned using `git prune-packed`.
 +
-garbage: the number of files in loose object database that are not
-valid loose objects
+garbage: the number of files in object database that are not valid
+loose objects nor valid packs
 
 GIT
 ---
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 9afaa88..7fdd508 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -9,6 +9,8 @@
 #include "builtin.h"
 #include "parse-options.h"
 
+static unsigned long garbage;
+
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
 			  off_t *loose_size,
@@ -65,6 +67,16 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 	}
 }
 
+extern void (*report_pack_garbage)(const char *path, int len, const char *name);
+static void real_report_pack_garbage(const char *path, int len, const char *name)
+{
+	if (len)
+		error("garbage found: %.*s/%s", len, path, name);
+	else
+		error("garbage found: %s", path);
+	garbage++;
+}
+
 static char const * const count_objects_usage[] = {
 	N_("git count-objects [-v]"),
 	NULL
@@ -76,7 +88,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	const char *objdir = get_object_directory();
 	int len = strlen(objdir);
 	char *path = xmalloc(len + 50);
-	unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0;
+	unsigned long loose = 0, packed = 0, packed_loose = 0;
 	off_t loose_size = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, N_("be verbose")),
@@ -87,6 +99,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	/* we do not take arguments other than flags for now */
 	if (argc)
 		usage_with_options(count_objects_usage, opts);
+	if (verbose)
+		report_pack_garbage = real_report_pack_garbage;
 	memcpy(path, objdir, len);
 	if (len && objdir[len-1] != '/')
 		path[len++] = '/';
diff --git a/sha1_file.c b/sha1_file.c
index 40b2329..5b70e55 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -21,6 +21,7 @@
 #include "sha1-lookup.h"
 #include "bulk-checkin.h"
 #include "streaming.h"
+#include "dir.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1000,15 +1001,19 @@ void install_packed_git(struct packed_git *pack)
 	packed_git = pack;
 }
 
+void (*report_pack_garbage)(const char *path, int len, const char *name);
+
 static void prepare_packed_git_one(char *objdir, int local)
 {
 	/* Ensure that this buffer is large enough so that we can
 	   append "/pack/" without clobbering the stack even if
 	   strlen(objdir) were PATH_MAX.  */
 	char path[PATH_MAX + 1 + 4 + 1 + 1];
-	int len;
+	int i, len;
 	DIR *dir;
 	struct dirent *de;
+	struct packed_git *p;
+	struct string_list garbage = STRING_LIST_INIT_DUP;
 
 	sprintf(path, "%s/pack", objdir);
 	len = strlen(path);
@@ -1024,14 +1029,33 @@ static void prepare_packed_git_one(char *objdir, int local)
 		int namelen = strlen(de->d_name);
 		struct packed_git *p;
 
-		if (!has_extension(de->d_name, ".idx"))
+		if (len + namelen + 1 > sizeof(path)) {
+			if (report_pack_garbage)
+				report_pack_garbage(path, len - 1, de->d_name);
 			continue;
+		}
 
-		if (len + namelen + 1 > sizeof(path))
+		strcpy(path + len, de->d_name);
+
+		if (!has_extension(de->d_name, ".idx")) {
+			if (!report_pack_garbage)
+				continue;
+			if (is_dot_or_dotdot(de->d_name))
+				continue;
+			if (!has_extension(de->d_name, ".pack")) {
+				report_pack_garbage(path, 0, NULL);
+				continue;
+			}
+			/*
+			 * we can't decide right know if this .pack is
+			 * garbage. Delay until we identify all good
+			 * packs.
+			 */
+			string_list_append(&garbage, path);
 			continue;
+		}
 
 		/* Don't reopen a pack we already have. */
-		strcpy(path + len, de->d_name);
 		for (p = packed_git; p; p = p->next) {
 			if (!memcmp(path, p->pack_name, len + namelen - 4))
 				break;
@@ -1047,6 +1071,25 @@ static void prepare_packed_git_one(char *objdir, int local)
 		install_packed_git(p);
 	}
 	closedir(dir);
+
+	if (!report_pack_garbage)
+		return;
+
+	sort_string_list(&garbage);
+	for (p = packed_git; p; p = p->next) {
+		struct string_list_item *item;
+		if (!p->pack_local)
+			continue;
+		item = string_list_lookup(&garbage, p->pack_name);
+		if (item)
+			item->util = &garbage; /* anything but NULL */
+	}
+	for (i = 0; i < garbage.nr; i++) {
+		struct string_list_item *item = garbage.items + i;
+		if (!item->util)
+			report_pack_garbage(item->string, 0, NULL);
+	}
+	string_list_clear(&garbage, 0);
 }
 
 static int sort_pack(const void *a_, const void *b_)
-- 8< --

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too
  2013-02-07  7:37     ` Duy Nguyen
@ 2013-02-07 18:12       ` Junio C Hamano
  2013-02-07 23:58         ` Duy Nguyen
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2013-02-07 18:12 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git

Duy Nguyen <pclouds@gmail.com> writes:

> I thought about that, but we may need to do extra stat() for loose
> garbage as well. As it is now, garbage is complained loudly, which
> gives me enough motivation to clean up, even without looking at how
> much disk space it uses.

I wouldn't call a single line "garbage: 4" exactly *loud*.  I also
think that this is not about *motivating* you, but about giving
more information to the users to help them assess the health of
their repository themselves.

By the way, I wonder if we also want to notice .git/objects/garbage
or .git/objects/ga/rbage if we are to do this?

> -- 8< --
> Subject: count-objects: report garbage pack directory too
>
> prepare_packed_git_one() is modified to allow count-objects to hook a
> report function to so we don't need to duplicate the pack searching
> logic in count-objects.c. When report_pack_garbage is NULL, the
> overhead is insignificant.
>
> diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
> index e816823..1611d7c 100644
> --- a/Documentation/git-count-objects.txt
> +++ b/Documentation/git-count-objects.txt
> @@ -33,8 +33,8 @@ size-pack: disk space consumed by the packs, in KiB
>  prune-packable: the number of loose objects that are also present in
>  the packs. These objects could be pruned using `git prune-packed`.
>  +
> -garbage: the number of files in loose object database that are not
> -valid loose objects
> +garbage: the number of files in object database that are not valid
> +loose objects nor valid packs
>  
>  GIT
>  ---
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index 9afaa88..7fdd508 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -9,6 +9,8 @@
>  #include "builtin.h"
>  #include "parse-options.h"
>  
> +static unsigned long garbage;
> +
>  static void count_objects(DIR *d, char *path, int len, int verbose,
>  			  unsigned long *loose,
>  			  off_t *loose_size,
> @@ -65,6 +67,16 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
>  	}
>  }
>  
> +extern void (*report_pack_garbage)(const char *path, int len, const char *name);
> +static void real_report_pack_garbage(const char *path, int len, const char *name)
> +{
> +	if (len)
> +		error("garbage found: %.*s/%s", len, path, name);
> +	else
> +		error("garbage found: %s", path);
> +	garbage++;
> +}
> +
>  static char const * const count_objects_usage[] = {
>  	N_("git count-objects [-v]"),
>  	NULL
> @@ -76,7 +88,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  	const char *objdir = get_object_directory();
>  	int len = strlen(objdir);
>  	char *path = xmalloc(len + 50);
> -	unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0;
> +	unsigned long loose = 0, packed = 0, packed_loose = 0;
>  	off_t loose_size = 0;
>  	struct option opts[] = {
>  		OPT__VERBOSE(&verbose, N_("be verbose")),
> @@ -87,6 +99,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  	/* we do not take arguments other than flags for now */
>  	if (argc)
>  		usage_with_options(count_objects_usage, opts);
> +	if (verbose)
> +		report_pack_garbage = real_report_pack_garbage;
>  	memcpy(path, objdir, len);
>  	if (len && objdir[len-1] != '/')
>  		path[len++] = '/';
> diff --git a/sha1_file.c b/sha1_file.c
> index 40b2329..5b70e55 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -21,6 +21,7 @@
>  #include "sha1-lookup.h"
>  #include "bulk-checkin.h"
>  #include "streaming.h"
> +#include "dir.h"
>  
>  #ifndef O_NOATIME
>  #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
> @@ -1000,15 +1001,19 @@ void install_packed_git(struct packed_git *pack)
>  	packed_git = pack;
>  }
>  
> +void (*report_pack_garbage)(const char *path, int len, const char *name);
> +
>  static void prepare_packed_git_one(char *objdir, int local)
>  {
>  	/* Ensure that this buffer is large enough so that we can
>  	   append "/pack/" without clobbering the stack even if
>  	   strlen(objdir) were PATH_MAX.  */
>  	char path[PATH_MAX + 1 + 4 + 1 + 1];
> -	int len;
> +	int i, len;
>  	DIR *dir;
>  	struct dirent *de;
> +	struct packed_git *p;
> +	struct string_list garbage = STRING_LIST_INIT_DUP;
>  
>  	sprintf(path, "%s/pack", objdir);
>  	len = strlen(path);
> @@ -1024,14 +1029,33 @@ static void prepare_packed_git_one(char *objdir, int local)
>  		int namelen = strlen(de->d_name);
>  		struct packed_git *p;
>  
> -		if (!has_extension(de->d_name, ".idx"))
> +		if (len + namelen + 1 > sizeof(path)) {
> +			if (report_pack_garbage)
> +				report_pack_garbage(path, len - 1, de->d_name);
>  			continue;
> +		}
>  
> -		if (len + namelen + 1 > sizeof(path))
> +		strcpy(path + len, de->d_name);
> +
> +		if (!has_extension(de->d_name, ".idx")) {
> +			if (!report_pack_garbage)
> +				continue;
> +			if (is_dot_or_dotdot(de->d_name))
> +				continue;
> +			if (!has_extension(de->d_name, ".pack")) {
> +				report_pack_garbage(path, 0, NULL);
> +				continue;
> +			}

Didn't I already say the logic should be inverted to whitelist the
known ones?  Saying "Anything that is not '.pack' is bad" here is a
direct opposite, I think.  

Add "A '.keep' file is OK" to this codeflow and see how it goes.

> +			/*
> +			 * we can't decide right know if this .pack is
> +			 * garbage. Delay until we identify all good
> +			 * packs.
> +			 */
> +			string_list_append(&garbage, path);
>  			continue;
> +		}
>  
>  		/* Don't reopen a pack we already have. */
> -		strcpy(path + len, de->d_name);
>  		for (p = packed_git; p; p = p->next) {
>  			if (!memcmp(path, p->pack_name, len + namelen - 4))
>  				break;
> @@ -1047,6 +1071,25 @@ static void prepare_packed_git_one(char *objdir, int local)
>  		install_packed_git(p);
>  	}
>  	closedir(dir);
> +
> +	if (!report_pack_garbage)
> +		return;
> +
> +	sort_string_list(&garbage);
> +	for (p = packed_git; p; p = p->next) {
> +		struct string_list_item *item;
> +		if (!p->pack_local)
> +			continue;
> +		item = string_list_lookup(&garbage, p->pack_name);
> +		if (item)
> +			item->util = &garbage; /* anything but NULL */
> +	}
> +	for (i = 0; i < garbage.nr; i++) {
> +		struct string_list_item *item = garbage.items + i;
> +		if (!item->util)
> +			report_pack_garbage(item->string, 0, NULL);
> +	}
> +	string_list_clear(&garbage, 0);
>  }
>  
>  static int sort_pack(const void *a_, const void *b_)
> -- 8< --

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too
  2013-02-07 18:12       ` Junio C Hamano
@ 2013-02-07 23:58         ` Duy Nguyen
  0 siblings, 0 replies; 14+ messages in thread
From: Duy Nguyen @ 2013-02-07 23:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Feb 8, 2013 at 1:12 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> I thought about that, but we may need to do extra stat() for loose
>> garbage as well. As it is now, garbage is complained loudly, which
>> gives me enough motivation to clean up, even without looking at how
>> much disk space it uses.
>
> I wouldn't call a single line "garbage: 4" exactly *loud*.  I also
> think that this is not about *motivating* you, but about giving
> more information to the users to help them assess the health of
> their repository themselves.

That's not the only line it prints:

error: garbage found:
.git/objects/pack/pack-8074dfd2b01f494a30f02d0374baa57632d26fea.commits
error: garbage found:
.git/objects/pack/pack-834c9dccca7634c2b225db1b5050a980cb2c2de0.commits
error: garbage found: .git/objects/pack/tmp_pack_G235da
error: garbage found:
.git/objects/pack/pack-8bdf298e9252573289cd4f1e83e80c9f261882a2.commits
count: 604
size: 5576
in-pack: 172307
packs: 4
size-pack: 50421
prune-packable: 4
garbage: 4

> By the way, I wonder if we also want to notice .git/objects/garbage
> or .git/objects/ga/rbage if we are to do this?

I prefer not (for code simplicity) because we may need to catch
.git/objects/pack/info/garbage too if we do that.

>> +                     if (!has_extension(de->d_name, ".pack")) {
>> +                             report_pack_garbage(path, 0, NULL);
>> +                             continue;
>> +                     }
>
> Didn't I already say the logic should be inverted to whitelist the
> known ones?  Saying "Anything that is not '.pack' is bad" here is a
> direct opposite, I think.

I must have missed it. Will do.

>
> Add "A '.keep' file is OK" to this codeflow and see how it goes.

OK
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 0/3] count-objects improvements
  2013-02-04 12:49 [PATCH 1/2] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
  2013-02-04 12:49 ` [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too Nguyễn Thái Ngọc Duy
@ 2013-02-08  3:48 ` Nguyễn Thái Ngọc Duy
  2013-02-08  3:48   ` [PATCH v2 1/3] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-02-08  3:48 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This series:
 - updates count-objects -v documentation, describe each line in detail
 - counts garbage files in pack directory in addition to loose odb
 - shows how much disk space consumed by garbage files

Nguyễn Thái Ngọc Duy (3):
  git-count-objects.txt: describe each line in -v output
  count-objects: report garbage files in pack directory too
  count-objects: report how much disk space taken by garbage files

 Documentation/git-count-objects.txt | 22 +++++++---
 builtin/count-objects.c             | 41 ++++++++++++++-----
 sha1_file.c                         | 81 +++++++++++++++++++++++++++++++++++--
 3 files changed, 127 insertions(+), 17 deletions(-)

-- 
1.8.1.2.495.g3fdf5d5.dirty

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] git-count-objects.txt: describe each line in -v output
  2013-02-08  3:48 ` [PATCH v2 0/3] count-objects improvements Nguyễn Thái Ngọc Duy
@ 2013-02-08  3:48   ` Nguyễn Thái Ngọc Duy
  2013-02-08  3:48   ` [PATCH v2 2/3] count-objects: report garbage files in pack directory too Nguyễn Thái Ngọc Duy
  2013-02-08  3:48   ` [PATCH v2 3/3] count-objects: report how much disk space taken by garbage files Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 14+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-02-08  3:48 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

The current description requires a bit of guessing (what clause
corresponds to what printed line?) and lacks information, such as
the unit of size and size-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-count-objects.txt | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 23c80ce..e816823 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -20,11 +20,21 @@ OPTIONS
 -------
 -v::
 --verbose::
-	In addition to the number of loose objects and disk
-	space consumed, it reports the number of in-pack
-	objects, number of packs, disk space consumed by those packs,
-	and number of objects that can be removed by running
-	`git prune-packed`.
+	Report in more detail:
++
+count: the number of loose objects
++
+size: disk space consumed by loose objects, in KiB
++
+in-pack: the number of in-pack objects
++
+size-pack: disk space consumed by the packs, in KiB
++
+prune-packable: the number of loose objects that are also present in
+the packs. These objects could be pruned using `git prune-packed`.
++
+garbage: the number of files in loose object database that are not
+valid loose objects
 
 GIT
 ---
-- 
1.8.1.2.495.g3fdf5d5.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] count-objects: report garbage files in pack directory too
  2013-02-08  3:48 ` [PATCH v2 0/3] count-objects improvements Nguyễn Thái Ngọc Duy
  2013-02-08  3:48   ` [PATCH v2 1/3] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
@ 2013-02-08  3:48   ` Nguyễn Thái Ngọc Duy
  2013-02-08 18:44     ` Junio C Hamano
  2013-02-08  3:48   ` [PATCH v2 3/3] count-objects: report how much disk space taken by garbage files Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 14+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-02-08  3:48 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

prepare_packed_git_one() is modified to allow count-objects to hook a
report function to so we don't need to duplicate the pack searching
logic in count-objects.c. When report_pack_garbage is NULL, the
overhead is insignificant.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-count-objects.txt |  4 +-
 builtin/count-objects.c             | 18 ++++++++-
 sha1_file.c                         | 81 +++++++++++++++++++++++++++++++++++--
 3 files changed, 97 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index e816823..1611d7c 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -33,8 +33,8 @@ size-pack: disk space consumed by the packs, in KiB
 prune-packable: the number of loose objects that are also present in
 the packs. These objects could be pruned using `git prune-packed`.
 +
-garbage: the number of files in loose object database that are not
-valid loose objects
+garbage: the number of files in object database that are not valid
+loose objects nor valid packs
 
 GIT
 ---
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 9afaa88..118b2ae 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -9,6 +9,20 @@
 #include "builtin.h"
 #include "parse-options.h"
 
+static unsigned long garbage;
+
+extern void (*report_pack_garbage)(const char *path, int len, const char *name);
+static void real_report_pack_garbage(const char *path, int len, const char *name)
+{
+	if (len && name)
+		error("garbage found: %.*s/%s", len, path, name);
+	else if (!len && name)
+		error("garbage found: %s%s", path, name);
+	else
+		error("garbage found: %s", path);
+	garbage++;
+}
+
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
 			  off_t *loose_size,
@@ -76,7 +90,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	const char *objdir = get_object_directory();
 	int len = strlen(objdir);
 	char *path = xmalloc(len + 50);
-	unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0;
+	unsigned long loose = 0, packed = 0, packed_loose = 0;
 	off_t loose_size = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, N_("be verbose")),
@@ -87,6 +101,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	/* we do not take arguments other than flags for now */
 	if (argc)
 		usage_with_options(count_objects_usage, opts);
+	if (verbose)
+		report_pack_garbage = real_report_pack_garbage;
 	memcpy(path, objdir, len);
 	if (len && objdir[len-1] != '/')
 		path[len++] = '/';
diff --git a/sha1_file.c b/sha1_file.c
index 40b2329..cc6ef03 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -21,6 +21,7 @@
 #include "sha1-lookup.h"
 #include "bulk-checkin.h"
 #include "streaming.h"
+#include "dir.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1000,6 +1001,54 @@ void install_packed_git(struct packed_git *pack)
 	packed_git = pack;
 }
 
+/* A hook for count-objects to report invalid files in pack directory */
+void (*report_pack_garbage)(const char *path, int len, const char *name);
+
+static const char *known_pack_extensions[] = { ".pack", ".keep", NULL };
+
+static void report_garbage(struct string_list *list)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct packed_git *p;
+	int i;
+
+	if (!report_pack_garbage)
+		return;
+
+	sort_string_list(list);
+
+	for (p = packed_git; p; p = p->next) {
+		struct string_list_item *item;
+		if (!p->pack_local)
+			continue;
+		strbuf_reset(&sb);
+		strbuf_add(&sb, p->pack_name,
+			   strlen(p->pack_name) - strlen(".pack"));
+		item = string_list_lookup(list, sb.buf);
+		if (!item)
+			continue;
+		/*
+		 * string_list_lookup does not guarantee to return the
+		 * first matched string if it's duplicated.
+		 */
+		while (item - list->items &&
+		       !strcmp(item[-1].string, item->string))
+			item--;
+		while (item - list->items < list->nr &&
+		       !strcmp(item->string, sb.buf)) {
+			item->util = NULL; /* non-garbage mark */
+			item++;
+		}
+	}
+	for (i = 0; i < list->nr; i++) {
+		struct string_list_item *item = list->items + i;
+		if (!item->util)
+			continue;
+		report_pack_garbage(item->string, 0, item->util);
+	}
+	strbuf_release(&sb);
+}
+
 static void prepare_packed_git_one(char *objdir, int local)
 {
 	/* Ensure that this buffer is large enough so that we can
@@ -1009,6 +1058,7 @@ static void prepare_packed_git_one(char *objdir, int local)
 	int len;
 	DIR *dir;
 	struct dirent *de;
+	struct string_list garbage = STRING_LIST_INIT_DUP;
 
 	sprintf(path, "%s/pack", objdir);
 	len = strlen(path);
@@ -1024,14 +1074,37 @@ static void prepare_packed_git_one(char *objdir, int local)
 		int namelen = strlen(de->d_name);
 		struct packed_git *p;
 
-		if (!has_extension(de->d_name, ".idx"))
+		if (len + namelen + 1 > sizeof(path)) {
+			if (report_pack_garbage)
+				report_pack_garbage(path, len - 1, de->d_name);
 			continue;
+		}
+
+		strcpy(path + len, de->d_name);
 
-		if (len + namelen + 1 > sizeof(path))
+		if (!has_extension(de->d_name, ".idx")) {
+			struct string_list_item *item;
+			int i, n;
+			if (!report_pack_garbage)
+				continue;
+			if (is_dot_or_dotdot(de->d_name))
+				continue;
+			for (i = 0; known_pack_extensions[i]; i++)
+				if (has_extension(de->d_name,
+						  known_pack_extensions[i]))
+					break;
+			if (!known_pack_extensions[i]) {
+				report_pack_garbage(path, 0, NULL);
+				continue;
+			}
+			n = strlen(path) - strlen(known_pack_extensions[i]);
+			item = string_list_append_nodup(&garbage,
+							xstrndup(path, n));
+			item->util = (void*)known_pack_extensions[i];
 			continue;
+		}
 
 		/* Don't reopen a pack we already have. */
-		strcpy(path + len, de->d_name);
 		for (p = packed_git; p; p = p->next) {
 			if (!memcmp(path, p->pack_name, len + namelen - 4))
 				break;
@@ -1047,6 +1120,8 @@ static void prepare_packed_git_one(char *objdir, int local)
 		install_packed_git(p);
 	}
 	closedir(dir);
+	report_garbage(&garbage);
+	string_list_clear(&garbage, 0);
 }
 
 static int sort_pack(const void *a_, const void *b_)
-- 
1.8.1.2.495.g3fdf5d5.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] count-objects: report how much disk space taken by garbage files
  2013-02-08  3:48 ` [PATCH v2 0/3] count-objects improvements Nguyễn Thái Ngọc Duy
  2013-02-08  3:48   ` [PATCH v2 1/3] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
  2013-02-08  3:48   ` [PATCH v2 2/3] count-objects: report garbage files in pack directory too Nguyễn Thái Ngọc Duy
@ 2013-02-08  3:48   ` Nguyễn Thái Ngọc Duy
  2013-02-08 18:47     ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-02-08  3:48 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 We may do some redundant stat() here, but I don't think it can slow
 count-objects down much to worry about.

 Documentation/git-count-objects.txt |  2 ++
 builtin/count-objects.c             | 29 ++++++++++++++++++-----------
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 1611d7c..da6e72e 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -35,6 +35,8 @@ the packs. These objects could be pruned using `git prune-packed`.
 +
 garbage: the number of files in object database that are not valid
 loose objects nor valid packs
++
+size-garbage: disk space consumed by garbage files, in KiB
 
 GIT
 ---
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 118b2ae..90d476d 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -10,24 +10,33 @@
 #include "parse-options.h"
 
 static unsigned long garbage;
+static off_t size_garbage;
 
 extern void (*report_pack_garbage)(const char *path, int len, const char *name);
 static void real_report_pack_garbage(const char *path, int len, const char *name)
 {
+	struct strbuf sb = STRBUF_INIT;
+	struct stat st;
+
 	if (len && name)
-		error("garbage found: %.*s/%s", len, path, name);
+		strbuf_addf(&sb, "%.*s/%s", len, path, name);
 	else if (!len && name)
-		error("garbage found: %s%s", path, name);
+		strbuf_addf(&sb, "%s%s", path, name);
 	else
-		error("garbage found: %s", path);
+		strbuf_addf(&sb, "%s", path);
+	error(_("garbage found: %s"), sb.buf);
+
+	if (!stat(sb.buf, &st))
+		size_garbage += st.st_size;
+
 	garbage++;
+	strbuf_release(&sb);
 }
 
 static void count_objects(DIR *d, char *path, int len, int verbose,
 			  unsigned long *loose,
 			  off_t *loose_size,
-			  unsigned long *packed_loose,
-			  unsigned long *garbage)
+			  unsigned long *packed_loose)
 {
 	struct dirent *ent;
 	while ((ent = readdir(d)) != NULL) {
@@ -59,11 +68,8 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 				(*loose_size) += xsize_t(on_disk_bytes(st));
 		}
 		if (bad) {
-			if (verbose) {
-				error("garbage found: %.*s/%s",
-				      len + 2, path, ent->d_name);
-				(*garbage)++;
-			}
+			if (verbose)
+				report_pack_garbage(path, len + 2, ent->d_name);
 			continue;
 		}
 		(*loose)++;
@@ -113,7 +119,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		if (!d)
 			continue;
 		count_objects(d, path, len, verbose,
-			      &loose, &loose_size, &packed_loose, &garbage);
+			      &loose, &loose_size, &packed_loose);
 		closedir(d);
 	}
 	if (verbose) {
@@ -138,6 +144,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		printf("size-pack: %lu\n", (unsigned long) (size_pack / 1024));
 		printf("prune-packable: %lu\n", packed_loose);
 		printf("garbage: %lu\n", garbage);
+		printf("size-garbage: %lu\n", (unsigned long) (size_garbage / 1024));
 	}
 	else
 		printf("%lu objects, %lu kilobytes\n",
-- 
1.8.1.2.495.g3fdf5d5.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] count-objects: report garbage files in pack directory too
  2013-02-08  3:48   ` [PATCH v2 2/3] count-objects: report garbage files in pack directory too Nguyễn Thái Ngọc Duy
@ 2013-02-08 18:44     ` Junio C Hamano
  2013-02-09  1:58       ` Duy Nguyen
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2013-02-08 18:44 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> prepare_packed_git_one() is modified to allow count-objects to hook a
> report function to so we don't need to duplicate the pack searching
> logic in count-objects.c. When report_pack_garbage is NULL, the
> overhead is insignificant.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/git-count-objects.txt |  4 +-
>  builtin/count-objects.c             | 18 ++++++++-
>  sha1_file.c                         | 81 +++++++++++++++++++++++++++++++++++--
>  3 files changed, 97 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
> index e816823..1611d7c 100644
> --- a/Documentation/git-count-objects.txt
> +++ b/Documentation/git-count-objects.txt
> @@ -33,8 +33,8 @@ size-pack: disk space consumed by the packs, in KiB
>  prune-packable: the number of loose objects that are also present in
>  the packs. These objects could be pruned using `git prune-packed`.
>  +
> -garbage: the number of files in loose object database that are not
> -valid loose objects
> +garbage: the number of files in object database that are not valid
> +loose objects nor valid packs
>  
>  GIT
>  ---
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index 9afaa88..118b2ae 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -9,6 +9,20 @@
>  #include "builtin.h"
>  #include "parse-options.h"
>  
> +static unsigned long garbage;
> +
> +extern void (*report_pack_garbage)(const char *path, int len, const char *name);
> +static void real_report_pack_garbage(const char *path, int len, const char *name)
> +{

Don't some callers call this on paths outside objects/pack/
directory?  Is it still report-pack-garbage?

> +	if (len && name)
> +		error("garbage found: %.*s/%s", len, path, name);
> +	else if (!len && name)
> +		error("garbage found: %s%s", path, name);
> +	else
> +		error("garbage found: %s", path);
> +	garbage++;
> +}
> +
>  static void count_objects(DIR *d, char *path, int len, int verbose,
>  			  unsigned long *loose,
>  			  off_t *loose_size,
> @@ -76,7 +90,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  	const char *objdir = get_object_directory();
>  	int len = strlen(objdir);
>  	char *path = xmalloc(len + 50);
> -	unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0;
> +	unsigned long loose = 0, packed = 0, packed_loose = 0;
>  	off_t loose_size = 0;
>  	struct option opts[] = {
>  		OPT__VERBOSE(&verbose, N_("be verbose")),
> @@ -87,6 +101,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  	/* we do not take arguments other than flags for now */
>  	if (argc)
>  		usage_with_options(count_objects_usage, opts);
> +	if (verbose)
> +		report_pack_garbage = real_report_pack_garbage;
>  	memcpy(path, objdir, len);
>  	if (len && objdir[len-1] != '/')
>  		path[len++] = '/';
> diff --git a/sha1_file.c b/sha1_file.c
> index 40b2329..cc6ef03 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -21,6 +21,7 @@
>  #include "sha1-lookup.h"
>  #include "bulk-checkin.h"
>  #include "streaming.h"
> +#include "dir.h"
>  
>  #ifndef O_NOATIME
>  #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
> @@ -1000,6 +1001,54 @@ void install_packed_git(struct packed_git *pack)
>  	packed_git = pack;
>  }
>  
> +/* A hook for count-objects to report invalid files in pack directory */
> +void (*report_pack_garbage)(const char *path, int len, const char *name);
> +
> +static const char *known_pack_extensions[] = { ".pack", ".keep", NULL };

This sounds wrong.  Isn't ".idx" also known?

> +static void report_garbage(struct string_list *list)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	struct packed_git *p;
> +	int i;
> +
> +	if (!report_pack_garbage)
> +		return;
> +
> +	sort_string_list(list);
> +
> +	for (p = packed_git; p; p = p->next) {
> +		struct string_list_item *item;
> +		if (!p->pack_local)
> +			continue;
> +		strbuf_reset(&sb);
> +		strbuf_add(&sb, p->pack_name,
> +			   strlen(p->pack_name) - strlen(".pack"));
> +		item = string_list_lookup(list, sb.buf);
> +		if (!item)
> +			continue;
> +		/*
> +		 * string_list_lookup does not guarantee to return the
> +		 * first matched string if it's duplicated.
> +		 */
> +		while (item - list->items &&
> +		       !strcmp(item[-1].string, item->string))
> +			item--;
> +		while (item - list->items < list->nr &&
> +		       !strcmp(item->string, sb.buf)) {
> +			item->util = NULL; /* non-garbage mark */
> +			item++;
> +		}
> +	}
> +	for (i = 0; i < list->nr; i++) {
> +		struct string_list_item *item = list->items + i;
> +		if (!item->util)
> +			continue;
> +		report_pack_garbage(item->string, 0, item->util);
> +	}
> +	strbuf_release(&sb);
> +}
> +
>  static void prepare_packed_git_one(char *objdir, int local)
>  {
>  	/* Ensure that this buffer is large enough so that we can
> @@ -1009,6 +1058,7 @@ static void prepare_packed_git_one(char *objdir, int local)
>  	int len;
>  	DIR *dir;
>  	struct dirent *de;
> +	struct string_list garbage = STRING_LIST_INIT_DUP;
>  
>  	sprintf(path, "%s/pack", objdir);
>  	len = strlen(path);
> @@ -1024,14 +1074,37 @@ static void prepare_packed_git_one(char *objdir, int local)
>  		int namelen = strlen(de->d_name);
>  		struct packed_git *p;
>  
> -		if (!has_extension(de->d_name, ".idx"))
> +		if (len + namelen + 1 > sizeof(path)) {
> +			if (report_pack_garbage)
> +				report_pack_garbage(path, len - 1, de->d_name);

A pack/in/a/very/long/path/pack-0000000000000000000000000000000000000000.pack
may pass when fed to "git verify-pack", but this will report it as "garbage",
without reporting what is wrong with it.  Wouldn't that confuse users?

>  			continue;
> +		}
> +
> +		strcpy(path + len, de->d_name);
>  
> -		if (len + namelen + 1 > sizeof(path))
> +		if (!has_extension(de->d_name, ".idx")) {
> +			struct string_list_item *item;
> +			int i, n;
> +			if (!report_pack_garbage)
> +				continue;
> +			if (is_dot_or_dotdot(de->d_name))
> +				continue;
> +			for (i = 0; known_pack_extensions[i]; i++)
> +				if (has_extension(de->d_name,
> +						  known_pack_extensions[i]))
> +					break;
> +			if (!known_pack_extensions[i]) {
> +				report_pack_garbage(path, 0, NULL);
> +				continue;
> +			}
> +			n = strlen(path) - strlen(known_pack_extensions[i]);
> +			item = string_list_append_nodup(&garbage,
> +							xstrndup(path, n));
> +			item->util = (void*)known_pack_extensions[i];
>  			continue;
> +		}

Why isn't this part more like this?

		if (dot-or-dotdot) {
			continue;
		} else if (has_extension(de->d_name, ".idx")) {
			do things for the .idx file;
		} else if (has_extension(de->d_name, ".pack") {
			do things for the .pack file, including
                        queuing the name if we haven't seen
			corresponding .idx for later examination;
		} else if (has_extension(de->d_name, ".keep") {
                	nothing special for now but we may
                        want to add some other checks later
		} else {
                	everything else is a garbage
                        report_pack_garbage();
		}
                        

>  
>  		/* Don't reopen a pack we already have. */
> -		strcpy(path + len, de->d_name);
>  		for (p = packed_git; p; p = p->next) {
>  			if (!memcmp(path, p->pack_name, len + namelen - 4))
>  				break;
> @@ -1047,6 +1120,8 @@ static void prepare_packed_git_one(char *objdir, int local)
>  		install_packed_git(p);
>  	}
>  	closedir(dir);
> +	report_garbage(&garbage);
> +	string_list_clear(&garbage, 0);
>  }
>  
>  static int sort_pack(const void *a_, const void *b_)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/3] count-objects: report how much disk space taken by garbage files
  2013-02-08  3:48   ` [PATCH v2 3/3] count-objects: report how much disk space taken by garbage files Nguyễn Thái Ngọc Duy
@ 2013-02-08 18:47     ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2013-02-08 18:47 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  We may do some redundant stat() here, but I don't think it can slow
>  count-objects down much to worry about.

I don't either.  Looks like a good change to me.

It appears that the sb.buf refactoring is better done to the
previous patch, but that is minor.

>  Documentation/git-count-objects.txt |  2 ++
>  builtin/count-objects.c             | 29 ++++++++++++++++++-----------
>  2 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
> index 1611d7c..da6e72e 100644
> --- a/Documentation/git-count-objects.txt
> +++ b/Documentation/git-count-objects.txt
> @@ -35,6 +35,8 @@ the packs. These objects could be pruned using `git prune-packed`.
>  +
>  garbage: the number of files in object database that are not valid
>  loose objects nor valid packs
> ++
> +size-garbage: disk space consumed by garbage files, in KiB
>  
>  GIT
>  ---
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index 118b2ae..90d476d 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -10,24 +10,33 @@
>  #include "parse-options.h"
>  
>  static unsigned long garbage;
> +static off_t size_garbage;
>  
>  extern void (*report_pack_garbage)(const char *path, int len, const char *name);
>  static void real_report_pack_garbage(const char *path, int len, const char *name)
>  {
> +	struct strbuf sb = STRBUF_INIT;
> +	struct stat st;
> +
>  	if (len && name)
> -		error("garbage found: %.*s/%s", len, path, name);
> +		strbuf_addf(&sb, "%.*s/%s", len, path, name);
>  	else if (!len && name)
> -		error("garbage found: %s%s", path, name);
> +		strbuf_addf(&sb, "%s%s", path, name);
>  	else
> -		error("garbage found: %s", path);
> +		strbuf_addf(&sb, "%s", path);
> +	error(_("garbage found: %s"), sb.buf);
> +
> +	if (!stat(sb.buf, &st))
> +		size_garbage += st.st_size;
> +
>  	garbage++;
> +	strbuf_release(&sb);
>  }
>  
>  static void count_objects(DIR *d, char *path, int len, int verbose,
>  			  unsigned long *loose,
>  			  off_t *loose_size,
> -			  unsigned long *packed_loose,
> -			  unsigned long *garbage)
> +			  unsigned long *packed_loose)
>  {
>  	struct dirent *ent;
>  	while ((ent = readdir(d)) != NULL) {
> @@ -59,11 +68,8 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
>  				(*loose_size) += xsize_t(on_disk_bytes(st));
>  		}
>  		if (bad) {
> -			if (verbose) {
> -				error("garbage found: %.*s/%s",
> -				      len + 2, path, ent->d_name);
> -				(*garbage)++;
> -			}
> +			if (verbose)
> +				report_pack_garbage(path, len + 2, ent->d_name);
>  			continue;
>  		}
>  		(*loose)++;
> @@ -113,7 +119,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  		if (!d)
>  			continue;
>  		count_objects(d, path, len, verbose,
> -			      &loose, &loose_size, &packed_loose, &garbage);
> +			      &loose, &loose_size, &packed_loose);
>  		closedir(d);
>  	}
>  	if (verbose) {
> @@ -138,6 +144,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  		printf("size-pack: %lu\n", (unsigned long) (size_pack / 1024));
>  		printf("prune-packable: %lu\n", packed_loose);
>  		printf("garbage: %lu\n", garbage);
> +		printf("size-garbage: %lu\n", (unsigned long) (size_garbage / 1024));
>  	}
>  	else
>  		printf("%lu objects, %lu kilobytes\n",

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] count-objects: report garbage files in pack directory too
  2013-02-08 18:44     ` Junio C Hamano
@ 2013-02-09  1:58       ` Duy Nguyen
  0 siblings, 0 replies; 14+ messages in thread
From: Duy Nguyen @ 2013-02-09  1:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Feb 9, 2013 at 1:44 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> +static void real_report_pack_garbage(const char *path, int len, const char *name)
>> +{
>
> Don't some callers call this on paths outside objects/pack/
> directory?  Is it still report-pack-garbage?

In fact 3/3 uses it to report loose garbage. Will rename.

>> +static const char *known_pack_extensions[] = { ".pack", ".keep", NULL };
>
> This sounds wrong.  Isn't ".idx" also known?

I had a comment saying "all known extensions related to a pack, except
.idx" but I dropped it. .idx being used as the pointer file needs to
be handled separately.

>> @@ -1024,14 +1074,37 @@ static void prepare_packed_git_one(char *objdir, int local)
>>               int namelen = strlen(de->d_name);
>>               struct packed_git *p;
>>
>> -             if (!has_extension(de->d_name, ".idx"))
>> +             if (len + namelen + 1 > sizeof(path)) {
>> +                     if (report_pack_garbage)
>> +                             report_pack_garbage(path, len - 1, de->d_name);
>
> A pack/in/a/very/long/path/pack-0000000000000000000000000000000000000000.pack
> may pass when fed to "git verify-pack", but this will report it as "garbage",
> without reporting what is wrong with it.  Wouldn't that confuse users?

If all other git commands do not recognize the pack, then I think it's
still considered garbage. We could either

 - make prepare_packed_git_one accept pack/in/a/very/long/path-...
 - show the reason why we consider it garbage

Which option do we do? Option 1 may involve chdir in, stat stat stat
and chdir out to get around short PATH_MAX limit on some system.

>> -             if (len + namelen + 1 > sizeof(path))
>> +             if (!has_extension(de->d_name, ".idx")) {
>> +                     struct string_list_item *item;
>> +                     int i, n;
>> +                     if (!report_pack_garbage)
>> +                             continue;
>> +                     if (is_dot_or_dotdot(de->d_name))
>> +                             continue;
>> +                     for (i = 0; known_pack_extensions[i]; i++)
>> +                             if (has_extension(de->d_name,
>> +                                               known_pack_extensions[i]))
>> +                                     break;
>> +                     if (!known_pack_extensions[i]) {
>> +                             report_pack_garbage(path, 0, NULL);
>> +                             continue;
>> +                     }
>> +                     n = strlen(path) - strlen(known_pack_extensions[i]);
>> +                     item = string_list_append_nodup(&garbage,
>> +                                                     xstrndup(path, n));
>> +                     item->util = (void*)known_pack_extensions[i];
>>                       continue;
>> +             }
>
> Why isn't this part more like this?
>
>                 if (dot-or-dotdot) {
>                         continue;
>                 } else if (has_extension(de->d_name, ".idx")) {
>                         do things for the .idx file;
>                 } else if (has_extension(de->d_name, ".pack") {
>                         do things for the .pack file, including
>                         queuing the name if we haven't seen
>                         corresponding .idx for later examination;
>                 } else if (has_extension(de->d_name, ".keep") {
>                         nothing special for now but we may
>                         want to add some other checks later
>                 } else {
>                         everything else is a garbage
>                         report_pack_garbage();
>                 }

Originally I checked known_extensions again in report_pack_garbage()
but after restructuring, yeah we can drop known_extensions and do it
your way. Looks much clearer.
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-02-09  1:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-04 12:49 [PATCH 1/2] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
2013-02-04 12:49 ` [PATCH 2/2] count-objects: report garbage files in .git/objects/pack directory too Nguyễn Thái Ngọc Duy
2013-02-04 17:06   ` Junio C Hamano
2013-02-04 18:16   ` Junio C Hamano
2013-02-07  7:37     ` Duy Nguyen
2013-02-07 18:12       ` Junio C Hamano
2013-02-07 23:58         ` Duy Nguyen
2013-02-08  3:48 ` [PATCH v2 0/3] count-objects improvements Nguyễn Thái Ngọc Duy
2013-02-08  3:48   ` [PATCH v2 1/3] git-count-objects.txt: describe each line in -v output Nguyễn Thái Ngọc Duy
2013-02-08  3:48   ` [PATCH v2 2/3] count-objects: report garbage files in pack directory too Nguyễn Thái Ngọc Duy
2013-02-08 18:44     ` Junio C Hamano
2013-02-09  1:58       ` Duy Nguyen
2013-02-08  3:48   ` [PATCH v2 3/3] count-objects: report how much disk space taken by garbage files Nguyễn Thái Ngọc Duy
2013-02-08 18:47     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.