All of lore.kernel.org
 help / color / mirror / Atom feed
* update-index --index-info producing spurious submodule commits
@ 2011-08-18 21:53 Greg Troxel
  2011-08-18 22:49 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Troxel @ 2011-08-18 21:53 UTC (permalink / raw)
  To: git; +Cc: Richard Hansen


[-- Attachment #1.1: Type: text/plain, Size: 2624 bytes --]


For reasons too complicated to go into, I have a repository B which has
essentially been cloned from A, and there has been vast amounts of work
on B (thousands of commits, many branches).   These changes have not
been merged back to A.  I want to merge them back, but there's a
directory foo that has changes in B that I can't release.

So, I ran filter-branch with an index filter

  found the merge base with A
  removed foo
  did ls-tree on foo from merge base
    and updated the index

The theory is to make each commit in B look like no changes to anything
under foo, and otherwise the same.

After doing this, Richard noticed that the root tree of commits had a
foo object, but that it was labeled a commit instead of a tree (but in
fact it is a tree).  He noticed because diffs looked like submodules.

I was able to produce a minimal test case, output below, script
attached.  The below output is with 1.7.5.4 on NetBSD/i386 (and /amd64).
1.7.6 (ubuntu/amd64) has the same problem.

So:

  Am I using "git update-index --index-info" wrong?

  Or is there a bug?

Thanks,
Greg

Notice that "cat-file -p HEAD:" shows a tree before, and a commit
afterwards:


+ git init
Initialized empty Git repository in /usr/home/gdt/GIT_TEST/.git/
+ mkdir foo
+ touch foo/bar
+ git add foo
+ git commit -minitial content
[master (root-commit) 6755919] initial content
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 foo/bar
+ git cat-file -p HEAD
tree 72d67e6de0599f72f1265c925316f91f78395787
author Greg Troxel <gdt@ir.bbn.com> 1313703545 -0400
committer Greg Troxel <gdt@ir.bbn.com> 1313703545 -0400

initial content
+ git cat-file -p HEAD:
040000 tree d87cbcba0e2ede0752bdafc5938da35546803ba5	foo
+ git rm -r foo
rm 'foo/bar'
+ git ls-tree HEAD foo
040000 tree d87cbcba0e2ede0752bdafc5938da35546803ba5	foo
+ git ls-tree HEAD foo
+ git update-index --index-info
+ git diff --staged
diff --git a/foo b/foo
new file mode 160000
index 0000000..d87cbcb
--- /dev/null
+++ b/foo
@@ -0,0 +1 @@
+Subproject commit d87cbcba0e2ede0752bdafc5938da35546803ba5
diff --git a/foo/bar b/foo/bar
deleted file mode 100644
index e69de29..0000000
+ git commit -mmunged foo
[master 3348447] munged foo
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 160000 foo
 delete mode 100644 foo/bar
+ git cat-file -p HEAD
tree 04fbd499dbd01afb3241d7f0af8171fde008bfe3
parent 6755919e289665ec46d270672d29b594f992fa03
author Greg Troxel <gdt@ir.bbn.com> 1313703545 -0400
committer Greg Troxel <gdt@ir.bbn.com> 1313703545 -0400

munged foo
+ git cat-file -p HEAD:
160000 commit d87cbcba0e2ede0752bdafc5938da35546803ba5	foo



[-- Attachment #1.2: git-example --]
[-- Type: text/plain, Size: 372 bytes --]

#!/bin/sh

if [ -d .git ]; then
   echo "existing .git"
   exit 1
fi

set -x

git init
mkdir foo
touch foo/bar
git add foo
git commit -m'initial content'
git cat-file -p HEAD
git cat-file -p HEAD:
git rm -r foo
git ls-tree HEAD foo
git ls-tree HEAD foo | git update-index --index-info
git diff --staged
git commit -m'munged foo'
git cat-file -p HEAD
git cat-file -p HEAD:

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: update-index --index-info producing spurious submodule commits
  2011-08-18 21:53 update-index --index-info producing spurious submodule commits Greg Troxel
@ 2011-08-18 22:49 ` Junio C Hamano
  2011-08-19  0:27   ` Greg Troxel
  2011-08-19 17:20   ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2011-08-18 22:49 UTC (permalink / raw)
  To: Greg Troxel; +Cc: git, Richard Hansen

Greg Troxel <gdt@ir.bbn.com> writes:

> git ls-tree HEAD foo
> git ls-tree HEAD foo | git update-index --index-info

This --index-info definitely looks wrong, if "foo" is a directory, as the
entries in the index are supposed to be either blobs or commits.

As "update-index --index-info" predates "submodule" by a few years or
more, I wouldn't be surprised if the code didn't notice it was fed a wrong
input and produced nonsensical result that happened to be a commit.

The command could just instead barf, saying the input is wrong, but the
option was so low-level that it was deliberately written to accept and
store anything you throw at it --- even when it is nonsensical for the
version of plumbing, later updates to the data structure might have made
it making sense, which was the way to ease development of the system.

By now, we should start enforcing more sanity on its input.

 builtin/update-index.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index a6a23fa..4b32bfe 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -220,6 +220,12 @@ static int process_path(const char *path)
 	return add_one_path(ce, path, len, &st);
 }
 
+static int verify_mode(unsigned int mode)
+{
+	return (mode == 0160000 || mode == 0120000 ||
+		mode == 0100644 || mode == 0100755);
+}
+
 static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
 			 const char *path, int stage)
 {
@@ -229,6 +235,9 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
 	if (!verify_path(path))
 		return error("Invalid path '%s'", path);
 
+	if (!verify_mode(mode))
+		return error("Invalid mode '%o'", mode);
+
 	len = strlen(path);
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: update-index --index-info producing spurious submodule commits
  2011-08-18 22:49 ` Junio C Hamano
@ 2011-08-19  0:27   ` Greg Troxel
  2011-08-19  4:16     ` Junio C Hamano
  2011-08-19 17:20   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Greg Troxel @ 2011-08-19  0:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Richard Hansen

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]


Junio C Hamano <gitster@pobox.com> writes:

> Greg Troxel <gdt@ir.bbn.com> writes:
>
>> git ls-tree HEAD foo
>> git ls-tree HEAD foo | git update-index --index-info
>
> This --index-info definitely looks wrong, if "foo" is a directory, as the
> entries in the index are supposed to be either blobs or commits.

In the man page for update-index in the --index-info section, what I'm
doing seems to be covered by point 2, which specifically talks about
output of ls-tree.

I realize the index is a data structure that has pairs of paths to
blobs.  But I also think of it as a representation of a tree object that
would be referenced were one to commit (even if it isn't in that form
yet).  So I would argue that update-index with a tree should walk that
tree and insert all the paths resulting from expansion into the index?

> The command could just instead barf, saying the input is wrong, but the
> option was so low-level that it was deliberately written to accept and
> store anything you throw at it --- even when it is nonsensical for the
> version of plumbing, later updates to the data structure might have made
> it making sense, which was the way to ease development of the system.

If what I'm doing is an abuse of update-index, do you or anyone else
have a suggestion to make a directory in the index match a tree object?
(I'm trying to use an index filter; it takes 11 hours to run
filter-branch (5500 commits, 400K files in index, 800MB .git, ~2+GB
working directory, tmpdir on NetBSD tmpfs (all in ram)).)

Thanks,
Greg

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: update-index --index-info producing spurious submodule commits
  2011-08-19  0:27   ` Greg Troxel
@ 2011-08-19  4:16     ` Junio C Hamano
  2011-08-19 11:00       ` Greg Troxel
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2011-08-19  4:16 UTC (permalink / raw)
  To: Greg Troxel; +Cc: git, Richard Hansen

Greg Troxel <gdt@ir.bbn.com> writes:

> If what I'm doing is an abuse of update-index, do you or anyone else
> have a suggestion to make a directory in the index match a tree object?

"ls-tree -r HEAD foo" is probably what you meant to say.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: update-index --index-info producing spurious submodule commits
  2011-08-19  4:16     ` Junio C Hamano
@ 2011-08-19 11:00       ` Greg Troxel
  2011-08-20  2:15         ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Troxel @ 2011-08-19 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Richard Hansen

[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]


Junio C Hamano <gitster@pobox.com> writes:

> Greg Troxel <gdt@ir.bbn.com> writes:
>
>> If what I'm doing is an abuse of update-index, do you or anyone else
>> have a suggestion to make a directory in the index match a tree object?
>
> "ls-tree -r HEAD foo" is probably what you meant to say.

Thanks very much for the clue - that works.  The update-index
documentation should probably say that only blobs (or perhaps commits
intended to be submodules??) are acceptable, and perhaps say "ls-tree
-r" instead of ls-tree.  It could easily seem reasonable to someone to
pass in a tree and expect that to result in that tree being logically in
the index and to appear in the resulting commit.

For anyone else who is trying to do something like this, here's a
revised script that (I think) correctly reverts a directory to another commit.

#!/bin/sh

if [ -d .git ]; then
   echo "existing .git"
   exit 1
fi

set -x

git init
mkdir foo
touch foo/bar
git add foo
git commit -m'initial content'
git tag initial

touch foo/baz
git add foo/baz
git commit -m 'add baz'

git cat-file -p initial
git cat-file -p initial:
git rm --cached -r foo
git ls-tree initial foo
git ls-tree -r initial foo
git ls-tree -r initial foo | git update-index --index-info

git diff
git diff --staged
git commit -m'munged foo'

git cat-file -p HEAD
git cat-file -p HEAD:

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: update-index --index-info producing spurious submodule commits
  2011-08-18 22:49 ` Junio C Hamano
  2011-08-19  0:27   ` Greg Troxel
@ 2011-08-19 17:20   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2011-08-19 17:20 UTC (permalink / raw)
  To: git; +Cc: Greg Troxel, Richard Hansen

Junio C Hamano <gitster@pobox.com> writes:

> As "update-index --index-info" predates "submodule" by a few years or
> more, I wouldn't be surprised if the code didn't notice it was fed a wrong
> input and produced nonsensical result that happened to be a commit.
>
> The command could just instead barf, saying the input is wrong, but the
> option was so low-level that it was deliberately written to accept and
> store anything you throw at it --- even when it is nonsensical for the
> version of plumbing, later updates to the data structure might have made
> it making sense, which was the way to ease development of the system.

The second paragraph needs a bit of clarification. What I meant to say was
that the --index-info and its command line cousin --cacheinfo interfaces
are designed to be used like using a hex editor on the disk block device
to modify the file system in a random way, and just like a hex editor does
not prevent you from writing a data to the disk that is not understood or
misunderstood by the current filesystem implementations, ideally it should
allow you to put data that is beyond the current design of the index, so
that it can be used as a way to experiment while developing enhancements
to the index further. That in fact was how I experimented with updates to
the code to read from the index (in read-cache.c) in early days. Also they
do not even look at the object name they are given, and that is very much
deliberate---otherwise you cannot even stuff gitlinks in the index---and
in general, the less sanity-checks we do in that interface, the better off
we will be. After all we may someday start adding a tree entry in the
index for a reason unknown to us today.

I am all for documenting that today's index holds only regular blobs (mode
100644), executable blobs (mode 100755), symlink blobs (mode 120000), and
gitlinks (mode 160000), somewhere in the general part of the document not
specific to these options, and also documenting that the result of the
operation is undefined if anything outside the officially supported kinds
of input is fed to --index-info/--cacheinfo.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: update-index --index-info producing spurious submodule commits
  2011-08-19 11:00       ` Greg Troxel
@ 2011-08-20  2:15         ` Jonathan Nieder
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2011-08-20  2:15 UTC (permalink / raw)
  To: Greg Troxel; +Cc: Junio C Hamano, git, Richard Hansen

Greg Troxel wrote:
> Junio C Hamano <gitster@pobox.com> writes:

>> "ls-tree -r HEAD foo" is probably what you meant to say.
>
> Thanks very much for the clue - that works.  The update-index
> documentation should probably say that only blobs (or perhaps commits
> intended to be submodules??) are acceptable, and perhaps say "ls-tree
> -r" instead of ls-tree.

Makes sense.  Please make it so.

By the way, for this particular application I wonder if something like

	git ls-files -z <dir> | git update-index -z --force-remove --stdin
	git read-tree --prefix=<dir>/ <tree>

would be easier.  Or a commit-filter. :)

	tree=$1
	shift
	tree=$(
		git ls-tree -z "$tree" |
		perl -0ne '
			chop;
			my ($info, $name) = split(/\t/, $_, 2);
			if ($name eq "<dir>") {
				printf("040000 tree <good tree>\t<dir>\0");
			} else {
				printf("%s\0", $_);
			}
		' |
		git mktree -z
	)
	git commit-tree "$tree" "$@"

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-20  2:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-18 21:53 update-index --index-info producing spurious submodule commits Greg Troxel
2011-08-18 22:49 ` Junio C Hamano
2011-08-19  0:27   ` Greg Troxel
2011-08-19  4:16     ` Junio C Hamano
2011-08-19 11:00       ` Greg Troxel
2011-08-20  2:15         ` Jonathan Nieder
2011-08-19 17:20   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.