All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] Fix a corner case in git update-index --index-info
@ 2008-12-13 13:03 Thomas Jarosch
  2008-12-13 19:29 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Jarosch @ 2008-12-13 13:03 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Fix a corner case in git update-index --index-info:
If there are no input lines, it won't create an empty index.

Here's a short test for this:
echo -n "" |GIT_INDEX_FILE=index.new git update-index --index-info
-> The index "index.new" won't get created

It failed for me while I was using
git filter-branch as described in the man page:

    git filter-branch --index-filter \
            ´git ls-files -s | sed "s-\t-&newsubdir/-" |
                    GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                            git update-index --index-info &&
            mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE´ HEAD

The conversion aborted because the first commit was empty.
(created by cvs2svn)

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 65d5775..998a48e 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -299,6 +299,7 @@ static void read_index_info(int line_termination)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct strbuf uq = STRBUF_INIT;
+	int found_something = 0;
 
 	while (strbuf_getline(&buf, stdin, line_termination) != EOF) {
 		char *ptr, *tab;
@@ -308,6 +309,8 @@ static void read_index_info(int line_termination)
 		unsigned long ul;
 		int stage;
 
+		found_something = 1;
+
 		/* This reads lines formatted in one of three formats:
 		 *
 		 * (1) mode         SP sha1          TAB path
@@ -383,6 +386,11 @@ static void read_index_info(int line_termination)
 	bad_line:
 		die("malformed index info %s", buf.buf);
 	}
+
+	/* Force creation of empty index - needed by git filter-branch */
+	if (!found_something)
+		active_cache_changed = 1;
+
 	strbuf_release(&buf);
 	strbuf_release(&uq);
 }

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [patch] Fix a corner case in git update-index --index-info
  2008-12-13 13:03 [patch] Fix a corner case in git update-index --index-info Thomas Jarosch
@ 2008-12-13 19:29 ` Junio C Hamano
  2008-12-15 10:52   ` Thomas Jarosch
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2008-12-13 19:29 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: git

Thomas Jarosch <thomas.jarosch@intra2net.com> writes:

> Fix a corner case in git update-index --index-info:
> If there are no input lines, it won't create an empty index.
>
> Here's a short test for this:
> echo -n "" |GIT_INDEX_FILE=index.new git update-index --index-info
> -> The index "index.new" won't get created
>
> It failed for me while I was using
> git filter-branch as described in the man page:
>
>     git filter-branch --index-filter \
>             ´git ls-files -s | sed "s-\t-&newsubdir/-" |
>                     GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
>                             git update-index --index-info &&
>             mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE´ HEAD
>
> The conversion aborted because the first commit was empty.
> (created by cvs2svn)

If you are doing a filter-branch and the commits near the beginning of the
history did not have any path you are interested in, I do not think you
would want to even create corresponding commits for them that record an
empty tree to begin with, so I do not necessarily agree with the above
command line.  The mv would fail due to absense of index.new file, and you
can take it as a sign that you can skip that commit.

Outside the context of your command line above, I am slightly more
sympathetic than neutral to the argument that "update-index --index-info"
(and "update-index --stdin", which I suspect would have the same issue,
but I did not check) should create an output file if one did not exist.

You should note however that such a change would rob from you a way to
detect that you did not feed anything to the command by checking the lack
of the output.  Such a change would break people's existing scripts that
relied on the existing behaviour; one example is that the above "The mv
would fail...and you can" would be made impossible.

> @@ -308,6 +309,8 @@ static void read_index_info(int line_termination)
>  		unsigned long ul;
>  		int stage;
>  
> +		found_something = 1;
> +
>  		/* This reads lines formatted in one of three formats:
>  		 *
>  		 * (1) mode         SP sha1          TAB path
> @@ -383,6 +386,11 @@ static void read_index_info(int line_termination)
>  	bad_line:
>  		die("malformed index info %s", buf.buf);
>  	}
> +
> +	/* Force creation of empty index - needed by git filter-branch */

As I already mentioned, I do not agree with this "needed by" at all.

> +	if (!found_something)
> +		active_cache_changed = 1;
> +
>  	strbuf_release(&buf);
>  	strbuf_release(&uq);
>  }

I think this implementation is conceptually wrong, even if we assume it is
the right thing to always create a new file.  The --index-info mode may
well be fed with the same information as it already records, in which case
active_cache_changed shouldn't be toggled, and if it is fed something
different from what is recorded, active_cache_changed should be marked as
changed, and that decision should be left to the add_cache_entry() that is
called from add_cacheinfo().  What you did is to make it _always_ write
the new index out, even if we started with an existing index, and there
was no change, or even if we started with missing index, and there was no
input.  You only wanted the latter but you did both.

If that is what you want to do, you can get rid of found_something logic
altogether and flip active_cache_changed unconditionally to do the same
thing, but it feels wrong to write the index out when nothing changed.

Since I said I am slightly more sympathetic than neutral, here is a patch
that forces creation of an empty index file without making it write out
the same thing if the index already existed.

But again, this would break people who have been relying on the existing
behaviour that no resulting file, when GIT_INDEX_FILE points at a
nonexistent file, signals no operation.

I think it is a bad idea to do this in -rc period, even if we were to
change the semantics.

 builtin-update-index.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git c/builtin-update-index.c w/builtin-update-index.c
index 65d5775..9abc0b2 100644
--- c/builtin-update-index.c
+++ w/builtin-update-index.c
@@ -566,6 +566,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	char set_executable_bit = 0;
 	unsigned int refresh_flags = 0;
 	int lock_error = 0;
+	int was_unborn;
 	struct lock_file *lock_file;
 
 	git_config(git_default_config, NULL);
@@ -580,6 +581,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	entries = read_cache();
 	if (entries < 0)
 		die("cache corrupted");
+	was_unborn = is_cache_unborn();
 
 	for (i = 1 ; i < argc; i++) {
 		const char *path = argv[i];
@@ -677,7 +679,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 					die("--index-info must be at the end");
 				allow_add = allow_replace = allow_remove = 1;
 				read_index_info(line_termination);
-				break;
+				goto finish;
 			}
 			if (!strcmp(path, "--unresolve")) {
 				has_errors = do_unresolve(argc - i, argv + i,
@@ -738,7 +740,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	}
 
  finish:
-	if (active_cache_changed) {
+	if (active_cache_changed || was_unborn) {
 		if (newfd < 0) {
 			if (refresh_flags & REFRESH_QUIET)
 				exit(128);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [patch] Fix a corner case in git update-index --index-info
  2008-12-13 19:29 ` Junio C Hamano
@ 2008-12-15 10:52   ` Thomas Jarosch
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Jarosch @ 2008-12-15 10:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Saturday, 13. December 2008 20:29:12 you wrote:
> If you are doing a filter-branch and the commits near the beginning of the
> history did not have any path you are interested in, I do not think you
> would want to even create corresponding commits for them that record an
> empty tree to begin with, so I do not necessarily agree with the above
> command line.  The mv would fail due to absense of index.new file, and you
> can take it as a sign that you can skip that commit.

True. I killed the empty commit later using rebase -i. Great tool :-)

> Outside the context of your command line above, I am slightly more
> sympathetic than neutral to the argument that "update-index --index-info"
> (and "update-index --stdin", which I suspect would have the same issue,
> but I did not check) should create an output file if one did not exist.
>
> You should note however that such a change would rob from you a way to
> detect that you did not feed anything to the command by checking the lack
> of the output.  Such a change would break people's existing scripts that
> relied on the existing behaviour; one example is that the above "The mv
> would fail...and you can" would be made impossible.

That is also true. OTOH there would be no way to create an empty tree,
f.e. if you do positive filtering like --subdirectory-filter
just with multiple subdirs:

git filter-branch --tag-name-filter cat --index-filter \
    'git ls-files -s |grep -P "\t(DIR1|DIR2)" \
    |GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' -- --all

Later on I removed all empty commits in a second run.

> > +	if (!found_something)
> > +		active_cache_changed = 1;
> > +
> >  	strbuf_release(&buf);
> >  	strbuf_release(&uq);
> >  }
>
> I think this implementation is conceptually wrong, even if we assume it is
> the right thing to always create a new file.  The --index-info mode may
> well be fed with the same information as it already records, in which case
> active_cache_changed shouldn't be toggled, and if it is fed something
> different from what is recorded, active_cache_changed should be marked as
> changed, and that decision should be left to the add_cache_entry() that is
> called from add_cacheinfo().  What you did is to make it _always_ write
> the new index out, even if we started with an existing index, and there
> was no change, or even if we started with missing index, and there was no
> input.  You only wanted the latter but you did both.

The idea was to toggle the active_cache_changed variable only if we didn't get 
a single line of input from stdin. If you feed back the same index information
f.e. via "git ls-tree -s", the active_cache_changed=1 code shouldn't be 
executed. Though I didn't explicitly test this case, so I guess you are right.

> But again, this would break people who have been relying on the existing
> behaviour that no resulting file, when GIT_INDEX_FILE points at a
> nonexistent file, signals no operation.

See my remark about "positive list" filtering above.

> I think it is a bad idea to do this in -rc period, even if we were to
> change the semantics.

Yes, this is something one doesn't want in a -rc :-)

Thanks for your implementation.

btw: I sent a small documentation update to the list and forgot to add you
to the CC: list. The subject line was

"[patch] documentation: Explain how to free up space after filter-branch"

Cheers,
Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-12-15 10:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-13 13:03 [patch] Fix a corner case in git update-index --index-info Thomas Jarosch
2008-12-13 19:29 ` Junio C Hamano
2008-12-15 10:52   ` Thomas Jarosch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.