All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 00/14] git notes
@ 2009-09-08  2:26 Johan Herland
  2009-09-08  2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
                   ` (16 more replies)
  0 siblings, 17 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Yet another iteration of the 'git notes' feature. Rebased on top of 'next':
- Patches 1-9 are unchanged from (patches 1-7, 11-12 of) the last iteration.
- Patch 10 teaches the notes code to free its data structures on request.
- Patch 11 introduces the 16-tree notes lookup code that handles SHA1-based
  fanout schemes. This is pretty much unchanged from patch 8 in the previous
  iteration.
- Patch 12 adds selftests that verify correct parsing of notes trees with
  various SHA1-based fanouts.
- Patch 13 introduces a flexible parser for a variety of date-based and
  SHA1-based fanout schemes. This is the interesting part, as far as this
  iteration is concerned.
- Patch 14 adds selftests that verify correct parsing of notes trees with
  various date-based fanouts.

Note that the series does not yet include code for _writing_ notes into a
suitably structured notes tree. That will be done in a later iteration.

I have some performance numbers that I will send in a separate email.


Have fun! :)

...Johan


Johan Herland (9):
  Teach "-m <msg>" and "-F <file>" to "git notes edit"
  fast-import: Add support for importing commit notes
  t3302-notes-index-expensive: Speed up create_repo()
  Add flags to get_commit_notes() to control the format of the note string
  Teach notes code to free its internal data structures on request.
  Teach the notes lookup code to parse notes trees with various fanout schemes
  Selftests verifying semantics when loading notes trees with various fanouts
  Allow flexible organization of notes trees, using both commit date and SHA1
  Add test cases for date-based fanouts

Johannes Schindelin (5):
  Introduce commit notes
  Add a script to edit/inspect notes
  Speed up git notes lookup
  Add an expensive test for git-notes
  Add '%N'-format for pretty-printing commit notes

 .gitignore                        |    1 +
 Documentation/config.txt          |   13 +
 Documentation/git-fast-import.txt |   45 +++-
 Documentation/git-notes.txt       |   60 ++++
 Documentation/pretty-formats.txt  |    1 +
 Makefile                          |    3 +
 cache.h                           |    4 +
 command-list.txt                  |    1 +
 commit.c                          |    1 +
 config.c                          |    5 +
 environment.c                     |    1 +
 fast-import.c                     |   88 +++++-
 git-notes.sh                      |  121 +++++++
 notes.c                           |  673 +++++++++++++++++++++++++++++++++++++
 notes.h                           |   12 +
 pretty.c                          |   10 +
 t/t3301-notes.sh                  |  150 ++++++++
 t/t3302-notes-index-expensive.sh  |  118 +++++++
 t/t3303-notes-subtrees.sh         |  201 +++++++++++
 t/t9300-fast-import.sh            |  166 +++++++++
 20 files changed, 1664 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100644 notes.c
 create mode 100644 notes.h
 create mode 100755 t/t3301-notes.sh
 create mode 100755 t/t3302-notes-index-expensive.sh
 create mode 100755 t/t3303-notes-subtrees.sh

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCHv5 01/14] Introduce commit notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 02/14] Add a script to edit/inspect notes Johan Herland
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

Commit notes are blobs which are shown together with the commit
message.  These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.

The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).

The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.

This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
- Alex Riesen: Using char array instead of char pointer costs less BSS

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |   13 +++++++++
 Makefile                 |    2 +
 cache.h                  |    4 +++
 commit.c                 |    1 +
 config.c                 |    5 +++
 environment.c            |    1 +
 notes.c                  |   68 ++++++++++++++++++++++++++++++++++++++++++++++
 notes.h                  |    7 +++++
 pretty.c                 |    5 +++
 9 files changed, 106 insertions(+), 0 deletions(-)
 create mode 100644 notes.c
 create mode 100644 notes.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d445395..728d787 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -443,6 +443,19 @@ On some file system/operating system combinations, this is unreliable.
 Set this config setting to 'rename' there; However, This will remove the
 check that makes sure that existing object files will not get overwritten.
 
+core.notesRef::
+	When showing commit messages, also show notes which are stored in
+	the given ref.  This ref is expected to contain files named
+	after the full SHA-1 of the commit they annotate.
++
+If such a file exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line.  If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be printed.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
 add.ignore-errors::
 	Tells 'git-add' to continue adding files when some files cannot be
 	added due to indexing errors. Equivalent to the '--ignore-errors'
diff --git a/Makefile b/Makefile
index ce882d0..37b8f85 100644
--- a/Makefile
+++ b/Makefile
@@ -426,6 +426,7 @@ LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
 LIB_H += merge-recursive.h
+LIB_H += notes.h
 LIB_H += object.h
 LIB_H += pack.h
 LIB_H += pack-refs.h
@@ -509,6 +510,7 @@ LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += name-hash.o
+LIB_OBJS += notes.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-refs.o
diff --git a/cache.h b/cache.h
index ae324c9..917138b 100644
--- a/cache.h
+++ b/cache.h
@@ -371,6 +371,8 @@ static inline enum object_type object_type(unsigned int mode)
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
 
 extern int is_bare_repository_cfg;
 extern int is_bare_repository(void);
@@ -565,6 +567,8 @@ enum object_creation_mode {
 
 extern enum object_creation_mode object_creation_mode;
 
+extern char *notes_ref_name;
+
 extern int grafts_replace_parents;
 
 #define GIT_REPO_VERSION 0
diff --git a/commit.c b/commit.c
index a6c6f70..a0a77a6 100644
--- a/commit.c
+++ b/commit.c
@@ -5,6 +5,7 @@
 #include "utf8.h"
 #include "diff.h"
 #include "revision.h"
+#include "notes.h"
 
 int save_commit_buffer = 1;
 
diff --git a/config.c b/config.c
index e87edea..70a7d34 100644
--- a/config.c
+++ b/config.c
@@ -467,6 +467,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.notesref")) {
+		notes_ref_name = xstrdup(value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.pager"))
 		return git_config_string(&pager_program, var, value);
 
diff --git a/environment.c b/environment.c
index 5de6837..571ab56 100644
--- a/environment.c
+++ b/environment.c
@@ -49,6 +49,7 @@ enum push_default_type push_default = PUSH_DEFAULT_MATCHING;
 #define OBJECT_CREATION_MODE OBJECT_CREATION_USES_HARDLINKS
 #endif
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
+char *notes_ref_name;
 int grafts_replace_parents = 1;
 
 /* Parallel index stat data preload? */
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..401966d
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,68 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+#include "strbuf.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding)
+{
+	static const char utf8[] = "utf-8";
+	struct strbuf name = STRBUF_INIT;
+	unsigned char sha1[20];
+	char *msg, *msg_p;
+	unsigned long linelen, msglen;
+	enum object_type type;
+
+	if (!initialized) {
+		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		if (env)
+			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		else if (!notes_ref_name)
+			notes_ref_name = GIT_NOTES_DEFAULT_REF;
+		if (notes_ref_name && read_ref(notes_ref_name, sha1))
+			notes_ref_name = NULL;
+		initialized = 1;
+	}
+
+	if (!notes_ref_name)
+		return;
+
+	strbuf_addf(&name, "%s:%s", notes_ref_name,
+			sha1_to_hex(commit->object.sha1));
+	if (get_sha1(name.buf, sha1))
+		return;
+
+	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+			type != OBJ_BLOB)
+		return;
+
+	if (output_encoding && *output_encoding &&
+			strcmp(utf8, output_encoding)) {
+		char *reencoded = reencode_string(msg, output_encoding, utf8);
+		if (reencoded) {
+			free(msg);
+			msg = reencoded;
+			msglen = strlen(msg);
+		}
+	}
+
+	/* we will end the annotation by a newline anyway */
+	if (msglen && msg[msglen - 1] == '\n')
+		msglen--;
+
+	strbuf_addstr(sb, "\nNotes:\n");
+
+	for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
+		linelen = strchrnul(msg_p, '\n') - msg_p;
+
+		strbuf_addstr(sb, "    ");
+		strbuf_add(sb, msg_p, linelen);
+		strbuf_addch(sb, '\n');
+	}
+
+	free(msg);
+}
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..79d21b6
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,7 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding);
+
+#endif
diff --git a/pretty.c b/pretty.c
index f5983f8..e25db81 100644
--- a/pretty.c
+++ b/pretty.c
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "mailmap.h"
 #include "log-tree.h"
+#include "notes.h"
 #include "color.h"
 
 static char *user_format;
@@ -975,5 +976,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 	 */
 	if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
+
+	if (fmt != CMIT_FMT_ONELINE)
+		get_commit_notes(commit, sb, encoding);
+
 	free(reencoded);
 }
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 02/14] Add a script to edit/inspect notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
  2009-09-08  2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 03/14] Speed up git notes lookup Johan Herland
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

The script 'git notes' allows you to edit and show commit notes, by
calling either

	git notes show <commit>

or

	git notes edit <commit>

This patch has been improved by the following contributions:
- Tor Arne Vestbø: fix printing of multi-line notes
- Michael J Gruber: test and handle empty notes gracefully
- Thomas Rast:
  - only clean up message file when editing
  - use GIT_EDITOR and core.editor over VISUAL/EDITOR
  - t3301: fix confusing quoting in test for valid notes ref
  - t3301: use test_must_fail instead of !
  - refuse to edit notes outside refs/notes/
- Junio C Hamano: tests: fix "export var=val"
- Christian Couder: documentation: fix 'linkgit' macro in "git-notes.txt"
- Johan Herland: minor cleanup and bugfixing in git-notes.sh (v2)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                  |    1 +
 Documentation/git-notes.txt |   46 +++++++++++++++++
 Makefile                    |    1 +
 command-list.txt            |    1 +
 git-notes.sh                |   73 +++++++++++++++++++++++++++
 t/t3301-notes.sh            |  114 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 236 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100755 t/t3301-notes.sh

diff --git a/.gitignore b/.gitignore
index c446290..703241b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -86,6 +86,7 @@ git-mktag
 git-mktree
 git-name-rev
 git-mv
+git-notes
 git-pack-redundant
 git-pack-objects
 git-pack-refs
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..7136016
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,46 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add/inspect commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit]
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, without
+changing the commit.  To discern these notes from the message stored
+in the commit object, the notes are indented like the message, after
+an unindented line saying "Notes:".
+
+To disable commit notes, you have to set the config variable
+core.notesRef to the empty string.  Alternatively, you can set it
+to a different ref, something like "refs/notes/bugzilla".  This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+	Edit the notes for a given commit (defaults to HEAD).
+
+show::
+	Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the linkgit:git[7] suite
diff --git a/Makefile b/Makefile
index 37b8f85..8e8376c 100644
--- a/Makefile
+++ b/Makefile
@@ -316,6 +316,7 @@ SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
 SCRIPT_SH += git-mergetool--lib.sh
+SCRIPT_SH += git-notes.sh
 SCRIPT_SH += git-parse-remote.sh
 SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
diff --git a/command-list.txt b/command-list.txt
index fb03a2e..4296941 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -74,6 +74,7 @@ git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
 git-mv                                  mainporcelain common
 git-name-rev                            plumbinginterrogators
+git-notes                               mainporcelain
 git-pack-objects                        plumbingmanipulators
 git-pack-redundant                      plumbinginterrogators
 git-pack-refs                           ancillarymanipulators
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..f06c254
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,73 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$1" && usage
+ACTION="$1"; shift
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
+die "Invalid commit: $@"
+
+case "$ACTION" in
+edit)
+	if [ "${GIT_NOTES_REF#refs/notes/}" = "$GIT_NOTES_REF" ]; then
+		die "Refusing to edit notes in $GIT_NOTES_REF (outside of refs/notes/)"
+	fi
+
+	MSG_FILE="$GIT_DIR/new-notes-$COMMIT"
+	GIT_INDEX_FILE="$MSG_FILE.idx"
+	export GIT_INDEX_FILE
+
+	trap '
+		test -f "$MSG_FILE" && rm "$MSG_FILE"
+		test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
+	' 0
+
+	GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+
+	CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+	if [ -z "$CURRENT_HEAD" ]; then
+		PARENT=
+	else
+		PARENT="-p $CURRENT_HEAD"
+		git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+		git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+	fi
+
+	core_editor="$(git config core.editor)"
+	${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+
+	grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
+	mv "$MSG_FILE".processed "$MSG_FILE"
+	if [ -s "$MSG_FILE" ]; then
+		BLOB=$(git hash-object -w "$MSG_FILE") ||
+			die "Could not write into object database"
+		git update-index --add --cacheinfo 0644 $BLOB $COMMIT ||
+			die "Could not write index"
+	else
+		test -z "$CURRENT_HEAD" &&
+			die "Will not initialise with empty tree"
+		git update-index --force-remove $COMMIT ||
+			die "Could not update index"
+	fi
+
+	TREE=$(git write-tree) || die "Could not write tree"
+	NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+		die "Could not annotate"
+	git update-ref -m "Annotate $COMMIT" \
+		"$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+	git rev-parse -q --verify "$GIT_NOTES_REF":$COMMIT > /dev/null ||
+		die "No note for commit $COMMIT."
+	git show "$GIT_NOTES_REF":$COMMIT
+;;
+*)
+	usage
+esac
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..73e53be
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,114 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+	(MSG=3 && export MSG && test_must_fail git notes edit)
+'
+
+test_expect_success setup '
+	: > a1 &&
+	git add a1 &&
+	test_tick &&
+	git commit -m 1st &&
+	: > a2 &&
+	git add a2 &&
+	test_tick &&
+	git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+	(MSG=1 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes edit) &&
+	(MSG=2 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes show)
+'
+
+test_expect_success 'refusing to edit in refs/heads/' '
+	(MSG=1 GIT_NOTES_REF=refs/heads/bogus &&
+	 export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes edit)
+'
+
+test_expect_success 'refusing to edit in refs/remotes/' '
+	(MSG=1 GIT_NOTES_REF=refs/remotes/bogus &&
+	 export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes edit)
+'
+
+# 1 indicates caught gracefully by die, 128 means git-show barked
+test_expect_success 'handle empty notes gracefully' '
+	git notes show ; test 1 = $?
+'
+
+test_expect_success 'create notes' '
+	git config core.notesRef refs/notes/commits &&
+	MSG=b1 git notes edit &&
+	test ! -f .git/new-notes &&
+	test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+	test b1 = $(git notes show) &&
+	git show HEAD^ &&
+	test_must_fail git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+    2nd
+
+Notes:
+    b1
+EOF
+
+test_expect_success 'show notes' '
+	! (git cat-file commit HEAD | grep b1) &&
+	git log -1 > output &&
+	test_cmp expect output
+'
+test_expect_success 'create multi-line notes (setup)' '
+	: > a3 &&
+	git add a3 &&
+	test_tick &&
+	git commit -m 3rd &&
+	MSG="b3
+c3c3c3c3
+d3d3d3" git notes edit
+'
+
+cat > expect-multiline << EOF
+commit 1584215f1d29c65e99c6c6848626553fdd07fd75
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:15:13 2005 -0700
+
+    3rd
+
+Notes:
+    b3
+    c3c3c3c3
+    d3d3d3
+EOF
+
+printf "\n" >> expect-multiline
+cat expect >> expect-multiline
+
+test_expect_success 'show multi-line notes' '
+	git log -2 > output &&
+	test_cmp expect-multiline output
+'
+
+test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 03/14] Speed up git notes lookup
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
  2009-09-08  2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
  2009-09-08  2:26 ` [PATCHv5 02/14] Add a script to edit/inspect notes Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 04/14] Add an expensive test for git-notes Johan Herland
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.

The idea for the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.

Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object.  This
allows you to annotate arbitrary branches, or objects.

This patch has been improved by the following contributions:
- Junio C Hamano: fixed an obvious error in initialize_hash_map()

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 notes.c |  112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 102 insertions(+), 10 deletions(-)

diff --git a/notes.c b/notes.c
index 401966d..9172154 100644
--- a/notes.c
+++ b/notes.c
@@ -4,15 +4,112 @@
 #include "refs.h"
 #include "utf8.h"
 #include "strbuf.h"
+#include "tree-walk.h"
+
+struct entry {
+	unsigned char commit_sha1[20];
+	unsigned char notes_sha1[20];
+};
+
+struct hash_map {
+	struct entry *entries;
+	off_t count, size;
+};
 
 static int initialized;
+static struct hash_map hash_map;
+
+static int hash_index(struct hash_map *map, const unsigned char *sha1)
+{
+	int i = ((*(unsigned int *)sha1) % map->size);
+
+	for (;;) {
+		unsigned char *current = map->entries[i].commit_sha1;
+
+		if (!hashcmp(sha1, current))
+			return i;
+
+		if (is_null_sha1(current))
+			return -1 - i;
+
+		if (++i == map->size)
+			i = 0;
+	}
+}
+
+static void add_entry(const unsigned char *commit_sha1,
+		const unsigned char *notes_sha1)
+{
+	int index;
+
+	if (hash_map.count + 1 > hash_map.size >> 1) {
+		int i, old_size = hash_map.size;
+		struct entry *old = hash_map.entries;
+
+		hash_map.size = old_size ? old_size << 1 : 64;
+		hash_map.entries = (struct entry *)
+			xcalloc(sizeof(struct entry), hash_map.size);
+
+		for (i = 0; i < old_size; i++)
+			if (!is_null_sha1(old[i].commit_sha1)) {
+				index = -1 - hash_index(&hash_map,
+						old[i].commit_sha1);
+				memcpy(hash_map.entries + index, old + i,
+					sizeof(struct entry));
+			}
+		free(old);
+	}
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0) {
+		index = -1 - index;
+		hash_map.count++;
+	}
+
+	hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
+	hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+}
+
+static void initialize_hash_map(const char *notes_ref_name)
+{
+	unsigned char sha1[20], commit_sha1[20];
+	unsigned mode;
+	struct tree_desc desc;
+	struct name_entry entry;
+	void *buf;
+
+	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
+	    get_tree_entry(commit_sha1, "", sha1, &mode))
+		return;
+
+	buf = fill_tree_descriptor(&desc, sha1);
+	if (!buf)
+		die("Could not read %s for notes-index", sha1_to_hex(sha1));
+
+	while (tree_entry(&desc, &entry))
+		if (!get_sha1(entry.path, commit_sha1))
+			add_entry(commit_sha1, entry.sha1);
+	free(buf);
+}
+
+static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+{
+	int index;
+
+	if (!hash_map.size)
+		return NULL;
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0)
+		return NULL;
+	return hash_map.entries[index].notes_sha1;
+}
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding)
 {
 	static const char utf8[] = "utf-8";
-	struct strbuf name = STRBUF_INIT;
-	unsigned char sha1[20];
+	unsigned char *sha1;
 	char *msg, *msg_p;
 	unsigned long linelen, msglen;
 	enum object_type type;
@@ -23,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
 		else if (!notes_ref_name)
 			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		if (notes_ref_name && read_ref(notes_ref_name, sha1))
-			notes_ref_name = NULL;
+		initialize_hash_map(notes_ref_name);
 		initialized = 1;
 	}
 
-	if (!notes_ref_name)
-		return;
-
-	strbuf_addf(&name, "%s:%s", notes_ref_name,
-			sha1_to_hex(commit->object.sha1));
-	if (get_sha1(name.buf, sha1))
+	sha1 = lookup_notes(commit->object.sha1);
+	if (!sha1)
 		return;
 
 	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 04/14] Add an expensive test for git-notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (2 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 03/14] Speed up git notes lookup Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

git-notes have the potential of being pretty expensive, so test with
a lot of commits.  A lot.  So to make things cheaper, you have to
opt-in explicitely, by setting the environment variable
GIT_NOTES_TIMING_TESTS.

This patch has been improved by the following contributions:
- Junio C Hamano: tests: fix "export var=val"

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t3302-notes-index-expensive.sh |   98 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)
 create mode 100755 t/t3302-notes-index-expensive.sh

diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..0ef3e95
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+	say Skipping timing tests
+	test_done
+	exit
+}
+
+create_repo () {
+	number_of_commits=$1
+	nr=0
+	parent=
+	test -d .git || {
+	git init &&
+	tree=$(git write-tree) &&
+	while [ $nr -lt $number_of_commits ]; do
+		test_tick &&
+		commit=$(echo $nr | git commit-tree $tree $parent) ||
+			return
+		parent="-p $commit"
+		nr=$(($nr+1))
+	done &&
+	git update-ref refs/heads/master $commit &&
+	{
+		GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
+		git rev-list HEAD | cat -n | sed "s/^[ 	][ 	]*/ /g" |
+		while read nr sha1; do
+			blob=$(echo note $nr | git hash-object -w --stdin) &&
+			echo $sha1 | sed "s/^/0644 $blob 0	/"
+		done | git update-index --index-info &&
+		tree=$(git write-tree) &&
+		test_tick &&
+		commit=$(echo notes | git commit-tree $tree) &&
+		git update-ref refs/notes/commits $commit
+	} &&
+	git config core.notesRef refs/notes/commits
+	}
+}
+
+test_notes () {
+	count=$1 &&
+	git config core.notesRef refs/notes/commits &&
+	git log | grep "^    " > output &&
+	i=1 &&
+	while [ $i -le $count ]; do
+		echo "    $(($count-$i))" &&
+		echo "    note $i" &&
+		i=$(($i+1));
+	done > expect &&
+	git diff expect output
+}
+
+cat > time_notes << \EOF
+	mode=$1
+	i=1
+	while [ $i -lt $2 ]; do
+		case $1 in
+		no-notes)
+			GIT_NOTES_REF=non-existing; export GIT_NOTES_REF
+		;;
+		notes)
+			unset GIT_NOTES_REF
+		;;
+		esac
+		git log >/dev/null
+		i=$(($i+1))
+	done
+EOF
+
+time_notes () {
+	for mode in no-notes notes
+	do
+		echo $mode
+		/usr/bin/time sh ../time_notes $mode $1
+	done
+}
+
+for count in 10 100 1000 10000; do
+
+	mkdir $count
+	(cd $count;
+
+	test_expect_success "setup $count" "create_repo $count"
+
+	test_expect_success 'notes work' "test_notes $count"
+
+	test_expect_success 'notes timing' "time_notes 100"
+	)
+done
+
+test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit"
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (3 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 04/14] Add an expensive test for git-notes Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 06/14] fast-import: Add support for importing commit notes Johan Herland
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

The "-m" and "-F" options are already the established method
(in both git-commit and git-tag) to specify a commit/tag message
without invoking the editor. This patch teaches "git notes edit"
to respect the same options for specifying a notes message without
invoking the editor.

Multiple "-m" and/or "-F" options are concatenated as separate
paragraphs.

The patch also updates the "git notes" documentation and adds
selftests for the new functionality. Unfortunately, the added
selftests include a couple of lines with trailing whitespace
(without these the test will fail). This may cause git to warn
about "whitespace errors".

This patch has been improved by the following contributions:
- Thomas Rast: fix trailing whitespace in t3301

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-notes.txt |   16 ++++++++++-
 git-notes.sh                |   64 +++++++++++++++++++++++++++++++++++++-----
 t/t3301-notes.sh            |   36 ++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
index 7136016..94cceb1 100644
--- a/Documentation/git-notes.txt
+++ b/Documentation/git-notes.txt
@@ -8,7 +8,7 @@ git-notes - Add/inspect commit notes
 SYNOPSIS
 --------
 [verse]
-'git-notes' (edit | show) [commit]
+'git-notes' (edit [-F <file> | -m <msg>] | show) [commit]
 
 DESCRIPTION
 -----------
@@ -33,6 +33,20 @@ show::
 	Show the notes for a given commit (defaults to HEAD).
 
 
+OPTIONS
+-------
+-m <msg>::
+	Use the given note message (instead of prompting).
+	If multiple `-m` (or `-F`) options are given, their
+	values are concatenated as separate paragraphs.
+
+-F <file>::
+	Take the note message from the given file.  Use '-' to
+	read the note message from the standard input.
+	If multiple `-F` (or `-m`) options are given, their
+	values are concatenated as separate paragraphs.
+
+
 Author
 ------
 Written by Johannes Schindelin <johannes.schindelin@gmx.de>
diff --git a/git-notes.sh b/git-notes.sh
index f06c254..e642e47 100755
--- a/git-notes.sh
+++ b/git-notes.sh
@@ -1,16 +1,59 @@
 #!/bin/sh
 
-USAGE="(edit | show) [commit]"
+USAGE="(edit [-F <file> | -m <msg>] | show) [commit]"
 . git-sh-setup
 
-test -n "$3" && usage
-
 test -z "$1" && usage
 ACTION="$1"; shift
 
 test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
 test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
 
+MESSAGE=
+while test $# != 0
+do
+	case "$1" in
+	-m)
+		test "$ACTION" = "edit" || usage
+		shift
+		if test "$#" = "0"; then
+			die "error: option -m needs an argument"
+		else
+			if [ -z "$MESSAGE" ]; then
+				MESSAGE="$1"
+			else
+				MESSAGE="$MESSAGE
+
+$1"
+			fi
+			shift
+		fi
+		;;
+	-F)
+		test "$ACTION" = "edit" || usage
+		shift
+		if test "$#" = "0"; then
+			die "error: option -F needs an argument"
+		else
+			if [ -z "$MESSAGE" ]; then
+				MESSAGE="$(cat "$1")"
+			else
+				MESSAGE="$MESSAGE
+
+$(cat "$1")"
+			fi
+			shift
+		fi
+		;;
+	-*)
+		usage
+		;;
+	*)
+		break
+		;;
+	esac
+done
+
 COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
 die "Invalid commit: $@"
 
@@ -29,19 +72,24 @@ edit)
 		test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
 	' 0
 
-	GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
-
 	CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
 	if [ -z "$CURRENT_HEAD" ]; then
 		PARENT=
 	else
 		PARENT="-p $CURRENT_HEAD"
 		git read-tree "$GIT_NOTES_REF" || die "Could not read index"
-		git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
 	fi
 
-	core_editor="$(git config core.editor)"
-	${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+	if [ -z "$MESSAGE" ]; then
+		GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+		if [ ! -z "$CURRENT_HEAD" ]; then
+			git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+		fi
+		core_editor="$(git config core.editor)"
+		${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+	else
+		echo "$MESSAGE" > "$MSG_FILE"
+	fi
 
 	grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
 	mv "$MSG_FILE".processed "$MSG_FILE"
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index 73e53be..1e34f48 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh
@@ -110,5 +110,41 @@ test_expect_success 'show multi-line notes' '
 	git log -2 > output &&
 	test_cmp expect-multiline output
 '
+test_expect_success 'create -m and -F notes (setup)' '
+	: > a4 &&
+	git add a4 &&
+	test_tick &&
+	git commit -m 4th &&
+	echo "xyzzy" > note5 &&
+	git notes edit -m spam -F note5 -m "foo
+bar
+baz"
+'
+
+whitespace="    "
+cat > expect-m-and-F << EOF
+commit 15023535574ded8b1a89052b32673f84cf9582b8
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:16:13 2005 -0700
+
+    4th
+
+Notes:
+    spam
+$whitespace
+    xyzzy
+$whitespace
+    foo
+    bar
+    baz
+EOF
+
+printf "\n" >> expect-m-and-F
+cat expect-multiline >> expect-m-and-F
+
+test_expect_success 'show -m and -F notes' '
+	git log -3 > output &&
+	test_cmp expect-m-and-F output
+'
 
 test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 06/14] fast-import: Add support for importing commit notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (4 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
is similar to 'filemodify', except that no mode is supplied (all notes have
mode 0644), and the path is set to the hex SHA1 of the given "comittish".

This enables fast import of note objects along with their associated commits,
since the notes can now be named using the mark references of their
corresponding commits.

The patch also includes a test case of the added functionality.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/git-fast-import.txt |   45 +++++++++--
 fast-import.c                     |   88 +++++++++++++++++++-
 t/t9300-fast-import.sh            |  166 +++++++++++++++++++++++++++++++++++++
 3 files changed, 289 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f1c94b4..bb198c2 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -325,7 +325,7 @@ change to the project.
 	data
 	('from' SP <committish> LF)?
 	('merge' SP <committish> LF)?
-	(filemodify | filedelete | filecopy | filerename | filedeleteall)*
+	(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
 	LF?
 ....
 
@@ -348,14 +348,13 @@ commit message use a 0 length data.  Commit messages are free-form
 and are not interpreted by Git.  Currently they must be encoded in
 UTF-8, as fast-import does not permit other encodings to be specified.
 
-Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`
-and `filedeleteall` commands
+Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
+`filedeleteall` and `notemodify` commands
 may be included to update the contents of the branch prior to
 creating the commit.  These commands may be supplied in any order.
 However it is recommended that a `filedeleteall` command precede
-all `filemodify`, `filecopy` and `filerename` commands in the same
-commit, as `filedeleteall`
-wipes the branch clean (see below).
+all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
+the same commit, as `filedeleteall` wipes the branch clean (see below).
 
 The `LF` after the command is optional (it used to be required).
 
@@ -604,6 +603,40 @@ more memory per active branch (less than 1 MiB for even most large
 projects); so frontends that can easily obtain only the affected
 paths for a commit are encouraged to do so.
 
+`notemodify`
+^^^^^^^^^^^^
+Included in a `commit` command to add a new note (annotating a given
+commit) or change the content of an existing note.  This command has
+two different means of specifying the content of the note.
+
+External data format::
+	The data content for the note was already supplied by a prior
+	`blob` command.  The frontend just needs to connect it to the
+	commit that is to be annotated.
++
+....
+	'N' SP <dataref> SP <committish> LF
+....
++
+Here `<dataref>` can be either a mark reference (`:<idnum>`)
+set by a prior `blob` command, or a full 40-byte SHA-1 of an
+existing Git blob object.
+
+Inline data format::
+	The data content for the note has not been supplied yet.
+	The frontend wants to supply it as part of this modify
+	command.
++
+....
+	'N' SP 'inline' SP <committish> LF
+	data
+....
++
+See below for a detailed description of the `data` command.
+
+In both formats `<committish>` is any of the commit specification
+expressions also accepted by `from` (see above).
+
 `mark`
 ~~~~~~
 Arranges for fast-import to save a reference to the current object, allowing
diff --git a/fast-import.c b/fast-import.c
index dcfb8fa..1e91358 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -22,8 +22,8 @@ Format of STDIN stream:
     ('author' sp name sp '<' email '>' sp when lf)?
     'committer' sp name sp '<' email '>' sp when lf
     commit_msg
-    ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
-    ('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)*
+    ('from' sp committish lf)?
+    ('merge' sp committish lf)*
     file_change*
     lf?;
   commit_msg ::= data;
@@ -41,15 +41,18 @@ Format of STDIN stream:
   file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf;
   file_inm ::= 'M' sp mode sp 'inline' sp path_str lf
     data;
+  note_obm ::= 'N' sp (hexsha1 | idnum) sp committish lf;
+  note_inm ::= 'N' sp 'inline' sp committish lf
+    data;
 
   new_tag ::= 'tag' sp tag_str lf
-    'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf
+    'from' sp committish lf
     ('tagger' sp name sp '<' email '>' sp when lf)?
     tag_msg;
   tag_msg ::= data;
 
   reset_branch ::= 'reset' sp ref_str lf
-    ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
+    ('from' sp committish lf)?
     lf?;
 
   checkpoint ::= 'checkpoint' lf
@@ -88,6 +91,7 @@ Format of STDIN stream:
      # stream formatting is: \, " and LF.  Otherwise these values
      # are UTF8.
      #
+  committish  ::= (ref_str | hexsha1 | sha1exp_str | idnum);
   ref_str     ::= ref;
   sha1exp_str ::= sha1exp;
   tag_str     ::= tag;
@@ -2053,6 +2057,80 @@ static void file_change_cr(struct branch *b, int rename)
 		leaf.tree);
 }
 
+static void note_change_n(struct branch *b)
+{
+	const char *p = command_buf.buf + 2;
+	static struct strbuf uq = STRBUF_INIT;
+	struct object_entry *oe = oe;
+	struct branch *s;
+	unsigned char sha1[20], commit_sha1[20];
+	uint16_t inline_data = 0;
+
+	/* <dataref> or 'inline' */
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		hashcpy(sha1, oe->sha1);
+		p = x;
+	} else if (!prefixcmp(p, "inline")) {
+		inline_data = 1;
+		p += 6;
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		oe = find_object(sha1);
+		p += 40;
+	}
+	if (*p++ != ' ')
+		die("Missing space after SHA1: %s", command_buf.buf);
+
+	/* <committish> */
+	s = lookup_branch(p);
+	if (s) {
+		hashcpy(commit_sha1, s->sha1);
+	} else if (*p == ':') {
+		uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+		struct object_entry *commit_oe = find_mark(commit_mark);
+		if (commit_oe->type != OBJ_COMMIT)
+			die("Mark :%" PRIuMAX " not a commit", commit_mark);
+		hashcpy(commit_sha1, commit_oe->sha1);
+	} else if (!get_sha1(p, commit_sha1)) {
+		unsigned long size;
+		char *buf = read_object_with_reference(commit_sha1,
+			commit_type, &size, commit_sha1);
+		if (!buf || size < 46)
+			die("Not a valid commit: %s", p);
+		free(buf);
+	} else
+		die("Invalid ref name or SHA1 expression: %s", p);
+
+	if (inline_data) {
+		static struct strbuf buf = STRBUF_INIT;
+
+		if (p != uq.buf) {
+			strbuf_addstr(&uq, p);
+			p = uq.buf;
+		}
+		read_next_command();
+		parse_data(&buf);
+		store_object(OBJ_BLOB, &buf, &last_blob, sha1, 0);
+	} else if (oe) {
+		if (oe->type != OBJ_BLOB)
+			die("Not a blob (actually a %s): %s",
+				typename(oe->type), command_buf.buf);
+	} else {
+		enum object_type type = sha1_object_info(sha1, NULL);
+		if (type < 0)
+			die("Blob not found: %s", command_buf.buf);
+		if (type != OBJ_BLOB)
+			die("Not a blob (actually a %s): %s",
+			    typename(type), command_buf.buf);
+	}
+
+	tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
+		S_IFREG | 0644, NULL);
+}
+
 static void file_change_deleteall(struct branch *b)
 {
 	release_tree_content_recursive(b->branch_tree.tree);
@@ -2222,6 +2300,8 @@ static void parse_new_commit(void)
 			file_change_cr(b, 1);
 		else if (!prefixcmp(command_buf.buf, "C "))
 			file_change_cr(b, 0);
+		else if (!prefixcmp(command_buf.buf, "N "))
+			note_change_n(b);
 		else if (!strcmp("deleteall", command_buf.buf))
 			file_change_deleteall(b);
 		else {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index d33fc55..2f5c323 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1089,6 +1089,172 @@ test_expect_success 'P: fail on blob mark in gitlink' '
     test_must_fail git fast-import <input'
 
 ###
+### series Q (notes)
+###
+
+note1_data="Note for the first commit"
+note2_data="Note for the second commit"
+note3_data="Note for the third commit"
+
+test_tick
+cat >input <<INPUT_END
+blob
+mark :2
+data <<EOF
+$file2_data
+EOF
+
+commit refs/heads/notes-test
+mark :3
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+first (:3)
+COMMIT
+
+M 644 :2 file2
+
+blob
+mark :4
+data $file4_len
+$file4_data
+commit refs/heads/notes-test
+mark :5
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+second (:5)
+COMMIT
+
+M 644 :4 file4
+
+commit refs/heads/notes-test
+mark :6
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+third (:6)
+COMMIT
+
+M 644 inline file5
+data <<EOF
+$file5_data
+EOF
+
+M 755 inline file6
+data <<EOF
+$file6_data
+EOF
+
+blob
+mark :7
+data <<EOF
+$note1_data
+EOF
+
+blob
+mark :8
+data <<EOF
+$note2_data
+EOF
+
+commit refs/notes/foobar
+mark :9
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes (:9)
+COMMIT
+
+N :7 :3
+N :8 :5
+N inline :6
+data <<EOF
+$note3_data
+EOF
+
+INPUT_END
+test_expect_success \
+	'Q: commit notes' \
+	'git fast-import <input &&
+	 git whatchanged notes-test'
+test_expect_success \
+	'Q: verify pack' \
+	'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
+
+commit1=$(git rev-parse notes-test~2)
+commit2=$(git rev-parse notes-test^)
+commit3=$(git rev-parse notes-test)
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+first (:3)
+EOF
+test_expect_success \
+	'Q: verify first commit' \
+	'git cat-file commit notes-test~2 | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit1
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+second (:5)
+EOF
+test_expect_success \
+	'Q: verify second commit' \
+	'git cat-file commit notes-test^ | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit2
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+third (:6)
+EOF
+test_expect_success \
+	'Q: verify third commit' \
+	'git cat-file commit notes-test | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+notes (:9)
+EOF
+test_expect_success \
+	'Q: verify notes commit' \
+	'git cat-file commit refs/notes/foobar | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect.unsorted <<EOF
+100644 blob $commit1
+100644 blob $commit2
+100644 blob $commit3
+EOF
+cat expect.unsorted | sort >expect
+test_expect_success \
+	'Q: verify notes tree' \
+	'git cat-file -p refs/notes/foobar^{tree} | sed "s/ [0-9a-f]*	/ /" >actual &&
+	 test_cmp expect actual'
+
+echo "$note1_data" >expect
+test_expect_success \
+	'Q: verify note for first commit' \
+	'git cat-file blob refs/notes/foobar:$commit1 >actual && test_cmp expect actual'
+
+echo "$note2_data" >expect
+test_expect_success \
+	'Q: verify note for second commit' \
+	'git cat-file blob refs/notes/foobar:$commit2 >actual && test_cmp expect actual'
+
+echo "$note3_data" >expect
+test_expect_success \
+	'Q: verify note for third commit' \
+	'git cat-file blob refs/notes/foobar:$commit3 >actual && test_cmp expect actual'
+
+###
 ### series R (feature and option)
 ###
 
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo()
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (5 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 06/14] fast-import: Add support for importing commit notes Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Creating repos with 10/100/1000/10000 commits and notes takes a lot of time.
However, using git-fast-import to do the job is a lot more efficient than
using plumbing commands to do the same.

This patch decreases the overall run-time of this test on my machine from
~3 to ~1 minutes.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t3302-notes-index-expensive.sh |   74 ++++++++++++++++++++++++--------------
 1 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
index 0ef3e95..ee84fc4 100755
--- a/t/t3302-notes-index-expensive.sh
+++ b/t/t3302-notes-index-expensive.sh
@@ -16,30 +16,50 @@ test -z "$GIT_NOTES_TIMING_TESTS" && {
 create_repo () {
 	number_of_commits=$1
 	nr=0
-	parent=
 	test -d .git || {
 	git init &&
-	tree=$(git write-tree) &&
-	while [ $nr -lt $number_of_commits ]; do
-		test_tick &&
-		commit=$(echo $nr | git commit-tree $tree $parent) ||
-			return
-		parent="-p $commit"
-		nr=$(($nr+1))
-	done &&
-	git update-ref refs/heads/master $commit &&
-	{
-		GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
-		git rev-list HEAD | cat -n | sed "s/^[ 	][ 	]*/ /g" |
-		while read nr sha1; do
-			blob=$(echo note $nr | git hash-object -w --stdin) &&
-			echo $sha1 | sed "s/^/0644 $blob 0	/"
-		done | git update-index --index-info &&
-		tree=$(git write-tree) &&
+	(
+		while [ $nr -lt $number_of_commits ]; do
+			nr=$(($nr+1))
+			mark=$(($nr+$nr))
+			notemark=$(($mark+1))
+			test_tick &&
+			cat <<INPUT_END &&
+commit refs/heads/master
+mark :$mark
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+blob
+mark :$notemark
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			echo "N :$notemark :$mark" >> note_commit
+		done &&
 		test_tick &&
-		commit=$(echo notes | git commit-tree $tree) &&
-		git update-ref refs/notes/commits $commit
-	} &&
+		cat <<INPUT_END &&
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+INPUT_END
+
+		cat note_commit
+	) |
+	git fast-import --quiet &&
 	git config core.notesRef refs/notes/commits
 	}
 }
@@ -48,13 +68,13 @@ test_notes () {
 	count=$1 &&
 	git config core.notesRef refs/notes/commits &&
 	git log | grep "^    " > output &&
-	i=1 &&
-	while [ $i -le $count ]; do
-		echo "    $(($count-$i))" &&
-		echo "    note $i" &&
-		i=$(($i+1));
+	i=$count &&
+	while [ $i -gt 0 ]; do
+		echo "    commit #$i" &&
+		echo "    note for commit #$i" &&
+		i=$(($i-1));
 	done > expect &&
-	git diff expect output
+	test_cmp expect output
 }
 
 cat > time_notes << \EOF
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (6 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

This patch adds the following flags to get_commit_notes() for adjusting the
format of the produced note string:
- NOTES_SHOW_HEADER: Print "Notes:" line before the notes contents
- NOTES_INDENT: Indent notes contents by 4 spaces

Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c  |    8 +++++---
 notes.h  |    5 ++++-
 pretty.c |    3 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/notes.c b/notes.c
index 9172154..84c30c1 100644
--- a/notes.c
+++ b/notes.c
@@ -106,7 +106,7 @@ static unsigned char *lookup_notes(const unsigned char *commit_sha1)
 }
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
-		const char *output_encoding)
+		const char *output_encoding, int flags)
 {
 	static const char utf8[] = "utf-8";
 	unsigned char *sha1;
@@ -146,12 +146,14 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 	if (msglen && msg[msglen - 1] == '\n')
 		msglen--;
 
-	strbuf_addstr(sb, "\nNotes:\n");
+	if (flags & NOTES_SHOW_HEADER)
+		strbuf_addstr(sb, "\nNotes:\n");
 
 	for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
 		linelen = strchrnul(msg_p, '\n') - msg_p;
 
-		strbuf_addstr(sb, "    ");
+		if (flags & NOTES_INDENT)
+			strbuf_addstr(sb, "    ");
 		strbuf_add(sb, msg_p, linelen);
 		strbuf_addch(sb, '\n');
 	}
diff --git a/notes.h b/notes.h
index 79d21b6..7f3eed4 100644
--- a/notes.h
+++ b/notes.h
@@ -1,7 +1,10 @@
 #ifndef NOTES_H
 #define NOTES_H
 
+#define NOTES_SHOW_HEADER 1
+#define NOTES_INDENT 2
+
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
-		const char *output_encoding);
+		const char *output_encoding, int flags);
 
 #endif
diff --git a/pretty.c b/pretty.c
index e25db81..01eadd0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -978,7 +978,8 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 		strbuf_addch(sb, '\n');
 
 	if (fmt != CMIT_FMT_ONELINE)
-		get_commit_notes(commit, sb, encoding);
+		get_commit_notes(commit, sb, encoding,
+				 NOTES_SHOW_HEADER | NOTES_INDENT);
 
 	free(reencoded);
 }
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (7 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 10/14] Teach notes code to free its internal data structures on request Johan Herland
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/pretty-formats.txt |    1 +
 pretty.c                         |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 2a845b1..5fb10b3 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -123,6 +123,7 @@ The placeholders are:
 - '%s': subject
 - '%f': sanitized subject line, suitable for a filename
 - '%b': body
+- '%N': commit notes
 - '%Cred': switch color to red
 - '%Cgreen': switch color to green
 - '%Cblue': switch color to blue
diff --git a/pretty.c b/pretty.c
index 01eadd0..7f350bb 100644
--- a/pretty.c
+++ b/pretty.c
@@ -702,6 +702,10 @@ static size_t format_commit_item(struct strbuf *sb, const char *placeholder,
 	case 'd':
 		format_decoration(sb, commit);
 		return 1;
+	case 'N':
+		get_commit_notes(commit, sb, git_log_output_encoding ?
+			     git_log_output_encoding : git_commit_encoding, 0);
+		return 1;
 	}
 
 	/* For the rest we have to parse the commit header. */
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 10/14] Teach notes code to free its internal data structures on request.
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (8 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:26 ` [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

There's no need to be rude to memory-concious callers...

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |    7 +++++++
 notes.h |    2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/notes.c b/notes.c
index 84c30c1..008c3d4 100644
--- a/notes.c
+++ b/notes.c
@@ -160,3 +160,10 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 
 	free(msg);
 }
+
+void free_commit_notes()
+{
+	free(hash_map.entries);
+	memset(&hash_map, 0, sizeof(struct hash_map));
+	initialized = 0;
+}
diff --git a/notes.h b/notes.h
index 7f3eed4..41802e5 100644
--- a/notes.h
+++ b/notes.h
@@ -7,4 +7,6 @@
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding, int flags);
 
+void free_commit_notes();
+
 #endif
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (9 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 10/14] Teach notes code to free its internal data structures on request Johan Herland
@ 2009-09-08  2:26 ` Johan Herland
  2009-09-08  2:27 ` [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:26 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
  objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
  no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
  only the subtree with the longest matching prefix is consulted. This
  means that if the given commit is e.g. "deadbeef", and the notes tree have
  subtrees "de" and "dead", then the following paths in the notes tree are
  searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
  from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
  acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
  recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.

This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.

The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).

In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).

As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.

However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  317 +++++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 248 insertions(+), 69 deletions(-)

diff --git a/notes.c b/notes.c
index 008c3d4..6926aa6 100644
--- a/notes.c
+++ b/notes.c
@@ -6,103 +6,282 @@
 #include "strbuf.h"
 #include "tree-walk.h"

-struct entry {
-	unsigned char commit_sha1[20];
-	unsigned char notes_sha1[20];
+/*
+ * Use a non-balancing simple 16-tree structure with struct int_node as
+ * internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
+ * 16-array of pointers to its children.
+ * The bottom 2 bits of each pointer is used to identify the pointer type
+ * - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
+ * - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
+ * - ptr & 3 == 2 - pointer to note entry - cast to struct leaf_node *
+ * - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
+ *
+ * The root node is a statically allocated struct int_node.
+ */
+struct int_node {
+	void *a[16];
 };

-struct hash_map {
-	struct entry *entries;
-	off_t count, size;
+/*
+ * Leaf nodes come in two variants, note entries and subtree entries,
+ * distinguished by the LSb of the leaf node pointer (see above).
+ * As a note entry, the key is the SHA1 of the referenced commit, and the
+ * value is the SHA1 of the note object.
+ * As a subtree entry, the key is the prefix SHA1 (w/trailing NULs) of the
+ * referenced commit, using the last byte of the key to store the length of
+ * the prefix. The value is the SHA1 of the tree object containing the notes
+ * subtree.
+ */
+struct leaf_node {
+	unsigned char key_sha1[20];
+	unsigned char val_sha1[20];
 };

-static int initialized;
-static struct hash_map hash_map;
+#define PTR_TYPE_NULL     0
+#define PTR_TYPE_INTERNAL 1
+#define PTR_TYPE_NOTE     2
+#define PTR_TYPE_SUBTREE  3

-static int hash_index(struct hash_map *map, const unsigned char *sha1)
-{
-	int i = ((*(unsigned int *)sha1) % map->size);
+#define GET_PTR_TYPE(ptr)       ((uintptr_t) (ptr) & 3)
+#define CLR_PTR_TYPE(ptr)       ((void *) ((uintptr_t) (ptr) & ~3))
+#define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))

-	for (;;) {
-		unsigned char *current = map->entries[i].commit_sha1;
+#define GET_NIBBLE(n, sha1) (((sha1[n >> 1]) >> ((n & 0x01) << 2)) & 0x0f)

-		if (!hashcmp(sha1, current))
-			return i;
+#define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
+	(memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))

-		if (is_null_sha1(current))
-			return -1 - i;
+static struct int_node root_node;

-		if (++i == map->size)
-			i = 0;
+static int initialized;
+
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+		unsigned int n);
+
+/*
+ * To find a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth nibble of the key as an index into a:
+ *    - If a[n] is an int_node, recurse into that node and increment n
+ *    - If a leaf_node with matching key, return leaf_node (assert note entry)
+ *    - If a matching subtree entry, unpack that subtree entry (and remove it);
+ *      restart search at the current level.
+ *    - Otherwise, we end up at a NULL pointer, or a non-matching leaf_node.
+ *      Backtrack out of the recursion, one level at a time and check a[0]:
+ *      - If a[0] at the current level is a matching subtree entry, unpack that
+ *        subtree entry (and remove it); restart search at the current level.
+ */
+static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
+		const unsigned char *key_sha1)
+{
+	struct leaf_node *l;
+	unsigned char i = GET_NIBBLE(n, key_sha1);
+	void *p = tree->a[i];
+
+	switch(GET_PTR_TYPE(p)) {
+	case PTR_TYPE_INTERNAL:
+		l = note_tree_find(CLR_PTR_TYPE(p), n + 1, key_sha1);
+		if (l)
+			return l;
+		break;
+	case PTR_TYPE_NOTE:
+		l = (struct leaf_node *) CLR_PTR_TYPE(p);
+		if (!hashcmp(key_sha1, l->key_sha1))
+			return l; /* return note object matching given key */
+		break;
+	case PTR_TYPE_SUBTREE:
+		l = (struct leaf_node *) CLR_PTR_TYPE(p);
+		if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
+			/* unpack tree and resume search */
+			tree->a[i] = NULL;
+			load_subtree(l, tree, n);
+			free(l);
+			return note_tree_find(tree, n, key_sha1);
+		}
+		break;
+	case PTR_TYPE_NULL:
+	default:
+		assert(!p);
+		break;
 	}
+
+	/*
+	 * Did not find key at this (or any lower) level.
+	 * Check if there's a matching subtree entry in tree->a[0].
+	 * If so, unpack tree and resume search.
+	 */
+	p = tree->a[0];
+	if (GET_PTR_TYPE(p) != PTR_TYPE_SUBTREE)
+		return NULL;
+	l = (struct leaf_node *) CLR_PTR_TYPE(p);
+	if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
+		/* unpack tree and resume search */
+		tree->a[0] = NULL;
+		load_subtree(l, tree, n);
+		free(l);
+		return note_tree_find(tree, n, key_sha1);
+	}
+	return NULL;
 }

-static void add_entry(const unsigned char *commit_sha1,
-		const unsigned char *notes_sha1)
+/*
+ * To insert a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth nibble of the key as an index into a:
+ *    - If a[n] is NULL, store the tweaked pointer directly into a[n]
+ *    - If a[n] is an int_node, recurse into that node and increment n
+ *    - If a[n] is a leaf_node:
+ *      1. Check if they're equal, and handle that (abort? overwrite?)
+ *      2. Create a new int_node, and store both leaf_nodes there
+ *      3. Store the new int_node into a[n].
+ */
+static int note_tree_insert(struct int_node *tree, unsigned char n,
+		const struct leaf_node *entry, unsigned char type)
 {
-	int index;
-
-	if (hash_map.count + 1 > hash_map.size >> 1) {
-		int i, old_size = hash_map.size;
-		struct entry *old = hash_map.entries;
-
-		hash_map.size = old_size ? old_size << 1 : 64;
-		hash_map.entries = (struct entry *)
-			xcalloc(sizeof(struct entry), hash_map.size);
-
-		for (i = 0; i < old_size; i++)
-			if (!is_null_sha1(old[i].commit_sha1)) {
-				index = -1 - hash_index(&hash_map,
-						old[i].commit_sha1);
-				memcpy(hash_map.entries + index, old + i,
-					sizeof(struct entry));
-			}
-		free(old);
+	struct int_node *new_node;
+	const struct leaf_node *l;
+	int ret;
+	unsigned char i = GET_NIBBLE(n, entry->key_sha1);
+	void *p = tree->a[i];
+	assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
+	switch(GET_PTR_TYPE(p)) {
+	case PTR_TYPE_NULL:
+		assert(!p);
+		tree->a[i] = SET_PTR_TYPE(entry, type);
+		return 0;
+	case PTR_TYPE_INTERNAL:
+		return note_tree_insert(CLR_PTR_TYPE(p), n + 1, entry, type);
+	default:
+		assert(GET_PTR_TYPE(p) == PTR_TYPE_NOTE ||
+			GET_PTR_TYPE(p) == PTR_TYPE_SUBTREE);
+		l = (const struct leaf_node *) CLR_PTR_TYPE(p);
+		if (!hashcmp(entry->key_sha1, l->key_sha1))
+			return -1; /* abort insert on matching key */
+		new_node = (struct int_node *)
+			xcalloc(sizeof(struct int_node), 1);
+		ret = note_tree_insert(new_node, n + 1,
+			CLR_PTR_TYPE(p), GET_PTR_TYPE(p));
+		if (ret) {
+			free(new_node);
+			return -1;
+		}
+		tree->a[i] = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
+		return note_tree_insert(new_node, n + 1, entry, type);
 	}
+}

-	index = hash_index(&hash_map, commit_sha1);
-	if (index < 0) {
-		index = -1 - index;
-		hash_map.count++;
+/* Free the entire notes data contained in the given tree */
+static void note_tree_free(struct int_node *tree)
+{
+	unsigned int i;
+	for (i = 0; i < 16; i++) {
+		void *p = tree->a[i];
+		switch(GET_PTR_TYPE(p)) {
+		case PTR_TYPE_INTERNAL:
+			note_tree_free(CLR_PTR_TYPE(p));
+			/* fall through */
+		case PTR_TYPE_NOTE:
+		case PTR_TYPE_SUBTREE:
+			free(CLR_PTR_TYPE(p));
+		}
 	}
+}

-	hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
-	hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+/*
+ * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
+ * - hex      - Partial SHA1 segment in ASCII hex format
+ * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
+ * - sha1     - Partial SHA1 value is written here
+ * - sha1_len - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
+ * Returns -1 on error (invalid arguments or invalid SHA1 (not in hex format).
+ * Otherwise, returns number of bytes written to sha1 (i.e. hex_len / 2).
+ * Pads sha1 with NULs up to sha1_len (not included in returned length).
+ */
+static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
+		unsigned char *sha1, unsigned int sha1_len)
+{
+	unsigned int i, len = hex_len >> 1;
+	if (hex_len % 2 != 0 || len > sha1_len)
+		return -1;
+	for (i = 0; i < len; i++) {
+		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+		if (val & ~0xff)
+			return -1;
+		*sha1++ = val;
+		hex += 2;
+	}
+	for (; i < sha1_len; i++)
+		*sha1++ = 0;
+	return len;
 }

-static void initialize_hash_map(const char *notes_ref_name)
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+		unsigned int n)
 {
-	unsigned char sha1[20], commit_sha1[20];
-	unsigned mode;
+	unsigned char commit_sha1[20];
+	unsigned int prefix_len;
+	int status;
+	void *buf;
 	struct tree_desc desc;
 	struct name_entry entry;
-	void *buf;
+
+	buf = fill_tree_descriptor(&desc, subtree->val_sha1);
+	if (!buf)
+		die("Could not read %s for notes-index",
+		     sha1_to_hex(subtree->val_sha1));
+
+	prefix_len = subtree->key_sha1[19];
+	assert(prefix_len * 2 >= n);
+	memcpy(commit_sha1, subtree->key_sha1, prefix_len);
+	while (tree_entry(&desc, &entry)) {
+		int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
+				commit_sha1 + prefix_len, 20 - prefix_len);
+		if (len < 0)
+			continue; /* entry.path is not a SHA1 sum. Skip */
+		len += prefix_len;
+
+		/*
+		 * If commit SHA1 is complete (len == 20), assume note object
+		 * If commit SHA1 is incomplete (len < 20), assume note subtree
+		 */
+		if (len <= 20) {
+			unsigned char type = PTR_TYPE_NOTE;
+			struct leaf_node *l = (struct leaf_node *)
+				xcalloc(sizeof(struct leaf_node), 1);
+			hashcpy(l->key_sha1, commit_sha1);
+			hashcpy(l->val_sha1, entry.sha1);
+			if (len < 20) {
+				l->key_sha1[19] = (unsigned char) len;
+				type = PTR_TYPE_SUBTREE;
+			}
+			status = note_tree_insert(node, n, l, type);
+			assert(!status);
+		}
+	}
+	free(buf);
+}
+
+static void initialize_notes(const char *notes_ref_name)
+{
+	unsigned char sha1[20], commit_sha1[20];
+	unsigned mode;
+	struct leaf_node root_tree;

 	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
 	    get_tree_entry(commit_sha1, "", sha1, &mode))
 		return;

-	buf = fill_tree_descriptor(&desc, sha1);
-	if (!buf)
-		die("Could not read %s for notes-index", sha1_to_hex(sha1));
-
-	while (tree_entry(&desc, &entry))
-		if (!get_sha1(entry.path, commit_sha1))
-			add_entry(commit_sha1, entry.sha1);
-	free(buf);
+	hashclr(root_tree.key_sha1);
+	hashcpy(root_tree.val_sha1, sha1);
+	load_subtree(&root_tree, &root_node, 0);
 }

 static unsigned char *lookup_notes(const unsigned char *commit_sha1)
 {
-	int index;
-
-	if (!hash_map.size)
-		return NULL;
-
-	index = hash_index(&hash_map, commit_sha1);
-	if (index < 0)
-		return NULL;
-	return hash_map.entries[index].notes_sha1;
+	struct leaf_node *found = note_tree_find(&root_node, 0, commit_sha1);
+	if (found)
+		return found->val_sha1;
+	return NULL;
 }

 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
@@ -120,7 +299,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
 		else if (!notes_ref_name)
 			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		initialize_hash_map(notes_ref_name);
+		initialize_notes(notes_ref_name);
 		initialized = 1;
 	}

@@ -163,7 +342,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,

 void free_commit_notes()
 {
-	free(hash_map.entries);
-	memset(&hash_map, 0, sizeof(struct hash_map));
+	note_tree_free(&root_node);
+	memset(&root_node, 0, sizeof(struct int_node));
 	initialized = 0;
 }
--
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (10 preceding siblings ...)
  2009-09-08  2:26 ` [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
@ 2009-09-08  2:27 ` Johan Herland
  2009-09-08  2:27 ` [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:27 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Add selftests verifying:
- that we are able to parse notes trees with various fanout schemes
- that notes trees with conflicting fanout schemes are parsed as expected

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3303-notes-subtrees.sh |  137 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 137 insertions(+), 0 deletions(-)
 create mode 100755 t/t3303-notes-subtrees.sh

diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
new file mode 100755
index 0000000..d24203e
--- /dev/null
+++ b/t/t3303-notes-subtrees.sh
@@ -0,0 +1,137 @@
+#!/bin/sh
+
+test_description='Test commit notes organized in subtrees'
+
+. ./test-lib.sh
+
+number_of_commits=100
+
+start_note_commit () {
+	test_tick &&
+	cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+from refs/notes/commits^0
+deleteall
+INPUT_END
+
+}
+
+verify_notes () {
+	git log | grep "^    " > output &&
+	i=$number_of_commits &&
+	while [ $i -gt 0 ]; do
+		echo "    commit #$i" &&
+		echo "    note for commit #$i" &&
+		i=$(($i-1));
+	done > expect &&
+	test_cmp expect output
+}
+
+test_expect_success "setup: create $number_of_commits commits" '
+
+	(
+		nr=0 &&
+		while [ $nr -lt $number_of_commits ]; do
+			nr=$(($nr+1)) &&
+			test_tick &&
+			cat <<INPUT_END
+commit refs/heads/master
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+		done &&
+		test_tick &&
+		cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+no notes
+COMMIT
+
+deleteall
+
+INPUT_END
+
+	) |
+	git fast-import --quiet &&
+	git config core.notesRef refs/notes/commits
+'
+
+test_sha1_based () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			note_path=$(echo "$sha1" | sed "$1")
+			cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test notes in 2/38-fanout' 'test_sha1_based "s|^..|&/|"'
+test_expect_success 'verify notes in 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 4/36-fanout' 'test_sha1_based "s|^....|&/|"'
+test_expect_success 'verify notes in 4/36-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/2/36-fanout' 'test_sha1_based "s|^\(..\)\(..\)|\1/\2/|"'
+test_expect_success 'verify notes in 2/2/36-fanout' 'verify_notes'
+
+test_preferred () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			preferred_note_path=$(echo "$sha1" | sed "$1")
+			ignored_note_path=$(echo "$sha1" | sed "$2")
+			cat <<INPUT_END &&
+M 100644 inline $ignored_note_path
+data <<EOF
+IGNORED note for commit #$nr
+EOF
+
+M 100644 inline $preferred_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test notes in 4/36-fanout overriding 2/38-fanout' 'test_preferred "s|^....|&/|" "s|^..|&/|"'
+test_expect_success 'verify notes in 4/36-fanout overriding 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/38-fanout overriding 2/2/36-fanout' 'test_preferred "s|^..|&/|" "s|^\(..\)\(..\)|\1/\2/|"'
+test_expect_success 'verify notes in 2/38-fanout overriding 2/2/36-fanout' 'verify_notes'
+
+test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (11 preceding siblings ...)
  2009-09-08  2:27 ` [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
@ 2009-09-08  2:27 ` Johan Herland
  2009-09-08  2:27 ` [PATCHv5 14/14] Add test cases for date-based fanouts Johan Herland
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:27 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

This is a major expansion of the notes lookup code to allow for variations
in the notes tree organization. The variations allowed include mixing fanout
schemes based on the commit dates of the annotated commits (aka. date-based
fanout) with fanout schemes based on the SHA1 of the annotated commits (aka.
SHA1-based fanout).

Using date-based fanout in the notes tree structure enables considerable
speedup in the notes lookup process, since notes are almost always looked up
sequentially in the (reverse) chronological order of their associated commits.
Furthermore, organizing notes in a way that allow (near) sequential lookup,
enables us to decrease memory consumption both by lazily loading parts of the
notes tree structure on-demand, and freeing parts of the notes structure that
are unlikely to be used again soon.

The new flexible organization of the notes tree changes the rules for valid
note tree entries. The new rules are as follows:

1. Note objects are named by the SHA1 of the commit they annotate, possibly
   split across several SHA1-based fanout levels (this is the same as is
   implemented earlier in this series).

2. Note entries are located within zero or more date-based fanout levels.

3. Date-based fanout schemes may use the year, month and day values of the
   associated commit's timestamp. The values must be prefixed by 'y', 'm'
   and 'd' (respectively) in the notes tree.

4. The date-based components can be combined in one fanout level, or split
   across multiple fanout levels. Individual components may not be split
   across multiple fanout levels.

5. The year/month/date values must be specified in that order, and month or
   date values may not occur without the preceding year or month value.

6. All entries of a tree object in the notes tree structure must follow the
   same scheme used at that level.

Thus, the following example note entries are all valid locations for a note
annotating commit 123456789abcdef0123456789abcdef0123456789 at 2009-09-01:
- 123456789abcdef0123456789abcdef0123456789
- 12/3456789abcdef0123456789abcdef0123456789
- 1234/56789abcdef0123456789abcdef0123456789
- 12/34/56789abcdef0123456789abcdef0123456789
- 1234/5678/9abcdef0123456789abcdef0123456789
- 1234/56/78/9abcdef0123456789abcdef0123456789
- y2009/123456789abcdef0123456789abcdef0123456789
- y2009/m09/12/3456789abcdef0123456789abcdef0123456789
- y2009/m09/d01/123456789abcdef0123456789abcdef0123456789
- y2009m09/12/34/56789abcdef0123456789abcdef0123456789
- y2009m09/d01/1234/567/89abcdef0123456789abcdef0123456789
- y2009/m09d01/12/34/56/78/9abcdef0123456789abcdef0123456789
- y2009m09d01/123456789abcdef0123456789abcdef0123456789

Conversely, the following example note entries are all invalid:
- 1/23456789abcdef0123456789abcdef0123456789 (violates #1)
- 123/456789abcdef0123456789abcdef0123456789 (violates #1)
- 12/345/6789abcdef0123456789abcdef0123456789 (violates #1)
- y2009123456789abcdef0123456789abcdef0123456789 (violates #2)
- 2009/09/01/123456789abcdef0123456789abcdef0123456789 (violates #3)
- y20/09/m09/12/3456789abcdef0123456789abcdef0123456789 (violates #4)
- y20/09m09/d01/123456789abcdef0123456789abcdef0123456789 (violates #4)
- y2009m/09/12/34/56789abcdef0123456789abcdef0123456789 (violates #4)
- y2009/d01/1234/5678/9abcdef0123456789abcdef0123456789 (violates #5)
- m09/y2009/d01/12/34/56/78/9abcdef0123456789abcdef0123456789 (violates #5)

>From rule #6, we see that the following example notes tree is valid:
- y2009m09/0123456789abcdef0123456789abcdef012345678
- y2009m09/123456789abcdef0123456789abcdef0123456789
- y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
- y2008m01/d31/34/56789abcdef0123456789abcdef0123456789ab
- y2008m01/d16/4567/89abcdef0123456789abcdef0123456789abc
- y2008m01/d16/5678/9abcdef0123456789abcdef0123456789abcd

Conversely the following structure is invalid (violates rule #6):
- y2009m09/0123456789abcdef0123456789abcdef012345678
- y2009m09/12/3456789abcdef0123456789abcdef0123456789
- y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
- y2008m01/34/56789abcdef0123456789abcdef0123456789ab
- y2008m01/d16/45/6789abcdef0123456789abcdef0123456789abc
- y2008/m01d16/5678/9abcdef0123456789abcdef0123456789abcd

The flexibility added by this patch adds considerable complexity to the notes
tree parser, but the runtime and memory usage is not significantly affected
(except for the effects introduced by the chosen notes tree structure).

Internally, the 16-tree data structure introduced in earlier patches is still
used to hold the SHA1-based fanout levels and the note entries themselves.
However, this patch adds a hierarchical date-based linked-list structure
around the 16-tree structure that mirrors the fanout scheme used in the
actual notes tree.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  403 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 364 insertions(+), 39 deletions(-)

diff --git a/notes.c b/notes.c
index 6926aa6..a3e3f83 100644
--- a/notes.c
+++ b/notes.c
@@ -7,6 +7,70 @@
 #include "tree-walk.h"
 
 /*
+ * Format of entries in the notes tree structure:
+ *
+ * note-entry   ::= (period sep)? sha1-spec
+ * period       ::= year sep?
+ *                  (month sep?
+ *                   (date sep?)?
+ *                  )?;
+ * year         ::= 'y' yearnum;
+ * month        ::= 'm' monthnum;
+ * date         ::= 'd' datenum;
+ * yearnum      ::= # 4-digit decimal year, from annotated commit's timestamp;
+ * monthnum     ::= # 2-digit decimal month, from annotated commit's timestamp;
+ * datenum      ::= # 2-digit decimal date, from annotated commit's timestamp;
+ * sha1-spec    ::= (hex-fragment sep?){20}
+ * sep          ::= '/';
+ * hex-fragment ::= # Fragment of hexsha1 (2 bytes);
+ * hexsha1      ::= # SHA1 of annotated commit in hex format (40 bytes);
+ *
+ * Thus, the following example note entries are all valid:
+ * - 0123456789abcdef0123456789abcdef012345678
+ * - 01/23456789abcdef0123456789abcdef012345678
+ * - 0123/456789abcdef0123456789abcdef012345678
+ * - 01/23/456789abcdef0123456789abcdef012345678
+ * - 0123/4567/89abcdef0123456789abcdef012345678
+ * - 0123/45/67/89abcdef0123456789abcdef012345678
+ * - y2009/0123456789abcdef0123456789abcdef012345678
+ * - y2009/m09/01/23456789abcdef0123456789abcdef012345678
+ * - y2009/m09/d01/0123456789abcdef0123456789abcdef012345678
+ * - y2009m09/01/23/456789abcdef0123456789abcdef012345678
+ * - y2009m09/d01/0123/4567/89abcdef0123456789abcdef012345678
+ * - y2009/m09d01/01/23/45/67/89abcdef0123456789abcdef012345678
+ * - y2009m09d01/0123456789abcdef0123456789abcdef012345678
+ *
+ * and the following example note entries are all invalid:
+ * - 0/123456789abcdef0123456789abcdef012345678
+ * - 012/3456789abcdef0123456789abcdef012345678
+ * - 01/234/56789abcdef0123456789abcdef012345678
+ * - y20090123456789abcdef0123456789abcdef012345678
+ * - y20/09/m09/01/23456789abcdef0123456789abcdef012345678
+ * - y20/09m09/d01/0123456789abcdef0123456789abcdef012345678
+ * - y2009m/09/01/23/456789abcdef0123456789abcdef012345678
+ * - y2009/d01/0123/4567/89abcdef0123456789abcdef012345678
+ * - m09/y2009/d01/01/23/45/67/89abcdef0123456789abcdef012345678
+ *
+ * In addition to the above per-entry rules, we require that _all_ entries at
+ * a given level in the notes tree (levels are separated by '/') follow the
+ * exact same format at that level. Thus the following structure is valid:
+ * - y2009m09/0123456789abcdef0123456789abcdef012345678
+ * - y2009m09/123456789abcdef0123456789abcdef0123456789
+ * - y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
+ * - y2008m01/d31/34/56789abcdef0123456789abcdef0123456789ab
+ * - y2008m01/d16/4567/89abcdef0123456789abcdef0123456789abc
+ * - y2008m01/d16/5678/9abcdef0123456789abcdef0123456789abcd
+ *
+ * but the following structure is invalid:
+ * - y2009m09/0123456789abcdef0123456789abcdef012345678
+ * - y2009m09/12/3456789abcdef0123456789abcdef0123456789
+ * - y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
+ * - y2008m01/34/56789abcdef0123456789abcdef0123456789ab
+ * - y2008m01/d16/45/6789abcdef0123456789abcdef0123456789abc
+ * - y2008/m01d16/5678/9abcdef0123456789abcdef0123456789abcd
+ */
+
+/*
  * Use a non-balancing simple 16-tree structure with struct int_node as
  * internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
  * 16-array of pointers to its children.
@@ -17,9 +81,45 @@
  * - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
  *
  * The root node is a statically allocated struct int_node.
+ *
+ * In order to allow date-based fanout schemes in addition to the original
+ * SHA1-based fanout schemes, we need to overload this structure, as follows:
+ * If the first pointer in the 16-array is ~0 (i.e. 0xffffffff on 32-bit
+ * systems and 0xffffffffffffffff on 64-bit systems), then the int_node is NOT
+ * to be interpreted as a 16-array of child node pointers. Rather, the int_node
+ * now represents a period-based node with the following properties:
+ * - The node has a pointer to a "child" node of type struct int_node, which is
+ *   EITHER a "regular" int_node object representing the root node of a 16-tree
+ *   structure holding notes associated with commits with timestamps within
+ *   that time period, OR another period-based int_node representing some
+ *   subdivision of the time period.
+ * - The node also has a pointer to a "previous" period-based int_node, which
+ *   represents the previous time period for which there exist note objects.
+ * - The node has a pointer to a "parent" node, which is the period-based
+ *   int_node that has this int_node as one of its children. This is needed
+ *   when traversing the date-based int_nodes looking for a period matching the
+ *   given commit. For top-level objects, this is set to NULL.
+ * - The node stores the SHA1 sum of the tree object that represents its child
+ *   (within the notes tree structure). Thus, we keep a reference to the child
+ *   structure that without necessarily allocating the child node (and
+ *   underlying structure).
+ * - Finally, the node has a period string, which indicates the time period of
+ *   the notes contained within, typically of the form "YYYY", "YYYY-MM" or
+ *   "YYYY-MM-DD", depending on the granularity of the corresponding
+ *   period-based entries in the notes tree structure.
  */
 struct int_node {
-	void *a[16];
+	union {
+		void *a[16];
+		struct {
+			void *magic;  /* ~0 "enables" this part of the union */
+			struct int_node *child;
+			struct int_node *prev;
+			struct int_node *parent;
+			unsigned char tree_sha1[20];
+			char period[11];  /* Enough to hold "YYYY-MM-DD" */
+		};
+	};
 };
 
 /*
@@ -51,12 +151,18 @@ struct leaf_node {
 #define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
 	(memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
 
+#define SUBTREE_DATE_PREFIXCMP(commit_date, subtree_date) \
+	(prefixcmp(commit_date, subtree_date))
+
 static struct int_node root_node;
 
+static struct int_node *cur_node;
+
 static int initialized;
 
-static void load_subtree(struct leaf_node *subtree, struct int_node *node,
-		unsigned int n);
+static void load_subtree(const unsigned char *sha1,
+		const unsigned char *prefix, unsigned int prefix_len,
+		struct int_node *node, struct int_node *parent, int n);
 
 /*
  * To find a leaf_node:
@@ -94,7 +200,8 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 		if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
 			/* unpack tree and resume search */
 			tree->a[i] = NULL;
-			load_subtree(l, tree, n);
+			load_subtree(l->val_sha1, l->key_sha1, l->key_sha1[19],
+				     tree, NULL, (int) n);
 			free(l);
 			return note_tree_find(tree, n, key_sha1);
 		}
@@ -117,7 +224,8 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 	if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
 		/* unpack tree and resume search */
 		tree->a[0] = NULL;
-		load_subtree(l, tree, n);
+		load_subtree(l->val_sha1, l->key_sha1, l->key_sha1[19], tree,
+			     NULL, (int) n);
 		free(l);
 		return note_tree_find(tree, n, key_sha1);
 	}
@@ -173,16 +281,28 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
 /* Free the entire notes data contained in the given tree */
 static void note_tree_free(struct int_node *tree)
 {
-	unsigned int i;
-	for (i = 0; i < 16; i++) {
-		void *p = tree->a[i];
-		switch(GET_PTR_TYPE(p)) {
-		case PTR_TYPE_INTERNAL:
-			note_tree_free(CLR_PTR_TYPE(p));
-			/* fall through */
-		case PTR_TYPE_NOTE:
-		case PTR_TYPE_SUBTREE:
-			free(CLR_PTR_TYPE(p));
+	if (tree->magic == (void *) ~0) {
+		if (tree->prev) {
+			note_tree_free(tree->prev);
+			free(tree->prev);
+		}
+		if (tree->child) {
+			note_tree_free(tree->child);
+			free(tree->child);
+		}
+	}
+	else {
+		unsigned int i;
+		for (i = 0; i < 16; i++) {
+			void *p = tree->a[i];
+			switch(GET_PTR_TYPE(p)) {
+			case PTR_TYPE_INTERNAL:
+				note_tree_free(CLR_PTR_TYPE(p));
+				/* fall through */
+			case PTR_TYPE_NOTE:
+			case PTR_TYPE_SUBTREE:
+				free(CLR_PTR_TYPE(p));
+			}
 		}
 	}
 }
@@ -215,29 +335,139 @@ static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
 	return len;
 }
 
-static void load_subtree(struct leaf_node *subtree, struct int_node *node,
-		unsigned int n)
+/*
+ * Parse year/month/date strings, and generate the corresponding period string
+ * for the given path entry:
+ * - prefix must follow one of these forms: "", "YYYY", "YYYY-MM"
+ * - path should follow one of these forms: "yYYYY", "yYYYYmMM", "yYYYYmMMdDD",
+ *   "mMMdDD", "mMM" or "dDD"
+ * The resulting string (which follows the form "YYYY", "YYYY-MM" or
+ * "YYYY-MM-DD") is returned as a static string. If path is not valid in the
+ * given (prefix) context, NULL is returned.
+ */
+static const char *parse_period(const char *prefix, unsigned int prefix_len,
+		const char *path, unsigned int path_len)
+{
+	static char result[11];
+	char expect_type;  /* y/m/d for year/month/day-based fanout */
+	unsigned int expect_len, value;
+	char *endptr, *target = result;
+
+	switch (prefix_len) {
+	case 0:
+		/* No prefix, expect year-based fanout in path */
+		expect_type = 'y';
+		expect_len = 4;
+		break;
+	case 4:
+		/* Year in prefix, expect month-based fanout in path */
+		expect_type = 'm';
+		expect_len = 2;
+		break;
+	case 7:
+		/* "YYYY-MM" in prefix, expect day-based fanout in path */
+		expect_type = 'd';
+		expect_len = 2;
+		break;
+	default:
+		die("Date-based notes tree loading invoked with invalid "
+		    "prefix '%.*s'", prefix_len, prefix);
+	}
+
+	if (path[0] != expect_type) {
+		warning("Unexpected entry path in date-based notes tree: '%s' "
+			"(skipping)", path);
+		return NULL;
+	}
+	value = (unsigned int) strtoul(path + 1, &endptr, 10);
+	switch (expect_type) {
+	case 'y':
+		if (value < 1969 || value >= 3000) {
+			warning("Invalid year value in date-based notes tree:"
+				" '%s' (skipping)", path);
+			return NULL;
+		}
+		break;
+	case 'm':
+		if (value < 1 || value > 12) {
+			warning("Invalid month value in date-based notes tree:"
+				" '%s' (skipping)", path);
+			return NULL;
+		}
+		break;
+	case 'd':
+		if (value < 1 || value > 31) {
+			warning("Invalid day value in date-based notes tree:"
+				" '%s' (skipping)", path);
+			return NULL;
+		}
+		break;
+	}
+
+	if (prefix == result) {
+		target = result + prefix_len;
+		prefix = NULL;
+		prefix_len = 0;
+	}
+	prefix_len = snprintf(target, 11, "%.*s%s%0*u", prefix_len, prefix,
+			      expect_len == 2 ? "-" : "", expect_len, value);
+	prefix_len += target - result;
+	assert(prefix_len < 11);
+
+	if (*endptr)  /* there are more components in this path */
+		return parse_period(result, prefix_len, endptr,
+				    path_len - (endptr - path));
+	return result;
+}
+
+static void load_date_subtree(struct tree_desc *tree_desc,
+		const char *prefix, unsigned int prefix_len,
+		struct int_node *node, struct int_node *parent)
+{
+	struct name_entry entry;
+	struct int_node *cur_node = NULL;
+	struct int_node *new_node;
+
+	while (tree_entry(tree_desc, &entry)) {
+		const char *period = parse_period(
+			prefix, prefix_len, entry.path, strlen(entry.path));
+		if (!period)
+			continue;
+		if (tree_desc->size)  /* this is not the last tree entry */
+			new_node = (struct int_node *)
+				xmalloc(sizeof(struct int_node));
+		else  /* this is the last entry, store directly into node */
+			new_node = node;
+
+		new_node->magic = (void *) ~0;
+		new_node->child = NULL;
+		new_node->prev = cur_node;
+		new_node->parent = parent;
+		hashcpy(new_node->tree_sha1, entry.sha1);
+		strcpy(new_node->period, period);
+		cur_node = new_node;
+	}
+	assert(!cur_node || cur_node == node);
+}
+
+static void load_sha1_subtree(struct tree_desc *tree_desc,
+		const unsigned char *prefix, unsigned int prefix_len,
+		struct int_node *node, unsigned char n)
 {
 	unsigned char commit_sha1[20];
-	unsigned int prefix_len;
 	int status;
-	void *buf;
-	struct tree_desc desc;
 	struct name_entry entry;
 
-	buf = fill_tree_descriptor(&desc, subtree->val_sha1);
-	if (!buf)
-		die("Could not read %s for notes-index",
-		     sha1_to_hex(subtree->val_sha1));
-
-	prefix_len = subtree->key_sha1[19];
 	assert(prefix_len * 2 >= n);
-	memcpy(commit_sha1, subtree->key_sha1, prefix_len);
-	while (tree_entry(&desc, &entry)) {
+	memcpy(commit_sha1, prefix, prefix_len);
+	while (tree_entry(tree_desc, &entry)) {
 		int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
 				commit_sha1 + prefix_len, 20 - prefix_len);
-		if (len < 0)
+		if (len < 0) {
+			warning("Invalid value in notes tree: '%s' (skipping)",
+				entry.path);
 			continue; /* entry.path is not a SHA1 sum. Skip */
+		}
 		len += prefix_len;
 
 		/*
@@ -258,6 +488,42 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
 			assert(!status);
 		}
 	}
+}
+
+static void load_subtree(const unsigned char *sha1,
+		const unsigned char *prefix, unsigned int prefix_len,
+		struct int_node *node, struct int_node *parent, int n)
+{
+	void *buf;
+	struct tree_desc desc;
+
+	buf = fill_tree_descriptor(&desc, sha1);
+	if (!buf)
+		die("Could not read notes subtree at %s", sha1_to_hex(sha1));
+	/*
+	 * After fill_tree_descriptor(), we can peek at the first tree entry
+	 * in desc.entry.
+	 */
+	switch (desc.entry.path[0]) {
+	case 'd':
+		if (strlen(desc.entry.path) != 3)
+			break;
+		/* fall-through */
+	case 'm':
+	case 'y':
+		/* path cannot be a SHA1 fragment */
+		load_date_subtree(&desc, (const char *) prefix, prefix_len,
+				  node, parent);
+		free(buf);
+		return;
+	}
+	if (n < 0) {
+		/* Arriving from a date-based subtree; reset prefix */
+		n = 0;
+		prefix = NULL;
+		prefix_len = 0;
+	}
+	load_sha1_subtree(&desc, prefix, prefix_len, node, n);
 	free(buf);
 }
 
@@ -265,23 +531,81 @@ static void initialize_notes(const char *notes_ref_name)
 {
 	unsigned char sha1[20], commit_sha1[20];
 	unsigned mode;
-	struct leaf_node root_tree;
 
 	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
 	    get_tree_entry(commit_sha1, "", sha1, &mode))
 		return;
 
-	hashclr(root_tree.key_sha1);
-	hashcpy(root_tree.val_sha1, sha1);
-	load_subtree(&root_tree, &root_node, 0);
+	load_subtree(sha1, NULL, 0, &root_node, NULL, 0);
+	cur_node = &root_node;
 }
 
-static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+static unsigned char *lookup_notes(const struct commit *commit)
 {
-	struct leaf_node *found = note_tree_find(&root_node, 0, commit_sha1);
-	if (found)
-		return found->val_sha1;
-	return NULL;
+	struct int_node *node = cur_node, *seen_node = cur_node;
+	struct leaf_node *found;
+	const char *short_date;
+
+	if (!node)
+		return NULL;
+
+	/* Convert commit->date to YYYY-MM-DD format */
+	short_date = show_date(commit->date, 0, DATE_SHORT);
+
+	while (node->magic == (void *) ~0) {  /* date-based node */
+		int cmp = SUBTREE_DATE_PREFIXCMP(short_date, node->period);
+		if (cmp == 0) {
+			/* Search inside child node */
+			if (!node->child) {
+				/* Must unpack child node first */
+				node->child = (struct int_node *)
+					xcalloc(sizeof(struct int_node), 1);
+				load_subtree(node->tree_sha1,
+					(const unsigned char *) node->period,
+					strlen(node->period), node->child,
+					node, -1);
+			}
+			seen_node = node;
+			node = node->child;
+		}
+		else if (cmp > 0) {
+			/* Search in past node */
+			if (node->prev)
+				node = node->prev;
+			else
+				node = node->parent;
+		}
+		else {
+			/* Search in future node */
+			if (!node->parent) {
+				/* Restart from root_node */
+				seen_node = node;
+				node = &root_node;
+			}
+			else
+				node = node->parent;
+		}
+		if (!node || node == seen_node) {
+			/* We've been here before, give up search */
+			return NULL;
+		}
+	}
+	while (cur_node &&
+	       SUBTREE_DATE_PREFIXCMP(cur_node->period, seen_node->period) < 0)
+	{
+		/*
+		 * We're about to move cur_node backwards in history. We are
+		 * unlikely to need this cur_node in the future, so free() it.
+		 */
+		note_tree_free(cur_node->child);
+		cur_node->child = NULL;
+		cur_node = cur_node->parent;
+	}
+	cur_node = seen_node;
+
+	/* Drill down further with SHA1-based lookup */
+	found = note_tree_find(node, 0, commit->object.sha1);
+	return found ? found->val_sha1 : NULL;
 }
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
@@ -303,7 +627,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		initialized = 1;
 	}
 
-	sha1 = lookup_notes(commit->object.sha1);
+	sha1 = lookup_notes(commit);
 	if (!sha1)
 		return;
 
@@ -342,6 +666,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 
 void free_commit_notes()
 {
+	cur_node = NULL;
 	note_tree_free(&root_node);
 	memset(&root_node, 0, sizeof(struct int_node));
 	initialized = 0;
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv5 14/14] Add test cases for date-based fanouts
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (12 preceding siblings ...)
  2009-09-08  2:27 ` [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
@ 2009-09-08  2:27 ` Johan Herland
  2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  2:27 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3303-notes-subtrees.sh |   64 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
index d24203e..d3cbf6d 100755
--- a/t/t3303-notes-subtrees.sh
+++ b/t/t3303-notes-subtrees.sh
@@ -134,4 +134,68 @@ test_expect_success 'verify notes in 4/36-fanout overriding 2/38-fanout' 'verify
 test_expect_success 'test notes in 2/38-fanout overriding 2/2/36-fanout' 'test_preferred "s|^..|&/|" "s|^\(..\)\(..\)|\1/\2/|"'
 test_expect_success 'verify notes in 2/38-fanout overriding 2/2/36-fanout' 'verify_notes'
 
+test_date_based () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git log --format="%H %ct" refs/heads/master |
+		while read sha1 date_t; do
+			date=$(date -u -d "@$date_t" +"$1")
+			note_path="$date/$(echo "$sha1" | sed "$2")"
+			cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test notes in y/40-fanout' 'test_date_based "y%Y" ""'
+test_expect_success 'verify notes in y/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/2/38-fanout' 'test_date_based "y%Y" "s|^..|&/|"'
+test_expect_success 'verify notes in y/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/40-fanout' 'test_date_based "y%Ym%m" ""'
+test_expect_success 'verify notes in ym/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/2/38-fanout' 'test_date_based "y%Ym%m" "s|^..|&/|"'
+test_expect_success 'verify notes in ym/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ymd/40-fanout' 'test_date_based "y%Ym%md%d" ""'
+test_expect_success 'verify notes in ymd/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ymd/2/38-fanout' 'test_date_based "y%Ym%md%d" "s|^..|&/|"'
+test_expect_success 'verify notes in ymd/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/40-fanout' 'test_date_based "y%Y/m%m" ""'
+test_expect_success 'verify notes in y/m/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/2/38-fanout' 'test_date_based "y%Y/m%m" "s|^..|&/|"'
+test_expect_success 'verify notes in y/m/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/md/40-fanout' 'test_date_based "y%Y/m%md%d" ""'
+test_expect_success 'verify notes in y/md/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/md/2/38-fanout' 'test_date_based "y%Y/m%md%d" "s|^..|&/|"'
+test_expect_success 'verify notes in y/md/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/d/40-fanout' 'test_date_based "y%Ym%m/d%d" ""'
+test_expect_success 'verify notes in ym/d/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/d/2/38-fanout' 'test_date_based "y%Ym%m/d%d" "s|^..|&/|"'
+test_expect_success 'verify notes in ym/d/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/d/40-fanout' 'test_date_based "y%Y/m%m/d%d" ""'
+test_expect_success 'verify notes in y/m/d/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/d/2/38-fanout' 'test_date_based "y%Y/m%m/d%d" "s|^..|&/|"'
+test_expect_success 'verify notes in y/m/d/2/38-fanout' 'verify_notes'
+
 test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (13 preceding siblings ...)
  2009-09-08  2:27 ` [PATCHv5 14/14] Add test cases for date-based fanouts Johan Herland
@ 2009-09-08  3:12 ` Johan Herland
  2009-09-08  4:16   ` Junio C Hamano
  2009-09-12 15:50   ` Johan Herland
  2009-09-10 14:00 ` Geert Bosch
  2009-09-12  0:11 ` Junio C Hamano
  16 siblings, 2 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08  3:12 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, trast, tavestbo, git, chriscool, spearce

[-- Attachment #1: Type: Text/Plain, Size: 4066 bytes --]

On Tuesday 08 September 2009, Johan Herland wrote:
> I have some performance numbers that I will send in a separate email.

Ok, here we go:

Test scenario:
Linux kernel repo with 157118 commits, 1 note per commit, with notes
organized into various fanout schemes.
Hardware is Intel Core 2 Quad with 4GB RAM.

The tests were done on the following algorithms:

- "before": This is the state of the notes code after applying patches 1-9.
  It uses the original notes-in-hash-map implementation, and does not grok
  any fanout scheme.

- "16tree": This is the state of the notes code after applying patch 10.
  It uses the 16-tree data structure that parses the SHA-1 based fanout
  schemes.

- "flexible": This is the state of the notes code after applying the entire
  patch series. This code parses a variety of date- and SHA1-based fanout
  schemes.

Furthermore, the following notes tree structures were tested:

- "no-notes": Testing without any notes at all. This is only present as a
  baseline, and to verify that the notes code does not negatively affect
  performance when not in use.

- "no-fanout": All notes stored directly inside the root notes tree object.

- "2_38": All notes stored in a SHA1-based 2/38 fanout scheme.

- "2_2_36": All notes stored in a SHA1-based 2/2/36 fanout scheme.

- "ym": Notes are organized within "yYYYYmMM"-named subtrees, where "YYYY"
  and "MM" are the year and month (respectively) from the annotated commit's
  commit date.

- "ym_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within
  the "yYYYYmMM"-named subtrees.

- "ymd": Notes are organized within "yYYYYmMMdDD"-named subtrees.

- "ymd_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within
  the "yYYYYmMMdDD"-named subtrees.

- "y_m": Notes are organized within two-level "yYYYY/mMM" subtrees.

- "y_m_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within
  the "yYYYY/mMM"-named subtrees.

- "y_m_d": Notes are organized within three-level "yYYYY/mMM/dDD" subtrees.

- "y_m_d_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme
  within the "yYYYY/mMM/dDD"-named subtrees.


Here are the runtime numbers, the first column shows the runtime for 100
repetitions of "git log -n10" (which we assume to be a common use case),
and the second column shows the runtime from a single run of
"git log --all" (which is somewhat closer to a worst case).


Algorithm / Notes tree   git log -n10 (x100)   git log --all
------------------------------------------------------------
before / no-notes              4.78s              63.90s
before / no-fanout            56.85s              65.69s

16tree / no-notes              4.77s              64.18s
16tree / no-fanout            30.35s              65.39s
16tree / 2_38                  5.57s              65.42s
16tree / 2_2_36                5.19s              65.76s

flexible / no-notes            4.78s              63.91s
flexible / no-fanout          30.34s              65.57s
flexible / 2_38                5.57s              65.46s
flexible / 2_2_36              5.18s              65.72s
flexible / ym                  5.13s              65.66s
flexible / ym_2_38             5.08s              65.63s
flexible / ymd                 5.30s              65.45s
flexible / ymd_2_38            5.29s              65.90s
flexible / y_m                 5.11s              65.72s
flexible / y_m_2_38            5.08s              65.67s
flexible / y_m_d               5.06s              65.50s
flexible / y_m_d_2_38          5.07s              65.79s


Finally, I have also looked at the memory consumption of the various
algorithms and fanout schemes:

The memory usage was measured by calculating the #bytes dynamically
allocated for the notes data structure, and printing the current
usage every time get_commit_notes() was called during a complete run
of "git log --all".

The results are attached as two gnuplot graphs, one with regular
axes, and one with logarithmic axes.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

[-- Attachment #2: memusage_gnuplot.png --]
[-- Type: image/png, Size: 18323 bytes --]

[-- Attachment #3: memusage_gnuplot_log.png --]
[-- Type: image/png, Size: 28646 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
@ 2009-09-08  4:16   ` Junio C Hamano
  2009-09-08  8:54     ` Johan Herland
  2009-09-12 15:50   ` Johan Herland
  1 sibling, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2009-09-08  4:16 UTC (permalink / raw)
  To: Johan Herland
  Cc: git, gitster, Johannes.Schindelin, trast, tavestbo, git,
	chriscool, spearce

Johan Herland <johan@herland.net> writes:

> Furthermore, the following notes tree structures were tested:
>
> - "no-notes": Testing without any notes at all. This is only present as a
>   baseline, and to verify that the notes code does not negatively affect
>   performance when not in use.

Minor nit.

For this to be a baseline, you would need to have another algorithm before
"before", i.e., without any of these notes implementation.

Comparison with "before" alone is not meaningful.  That is like starting
with a state with unknown performance regression compared to the stock
version, and then boast improvements made by various variations.

You would need to compare overhead of various "algorithms" with the stock
git in "no-notes" case as well.  It would give us the true performance
cost of supporting notes.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  4:16   ` Junio C Hamano
@ 2009-09-08  8:54     ` Johan Herland
  2009-09-08  9:32       ` Johannes Schindelin
  0 siblings, 1 reply; 58+ messages in thread
From: Johan Herland @ 2009-09-08  8:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool, spearce

On Tuesday 08 September 2009, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Furthermore, the following notes tree structures were tested:
> >
> > - "no-notes": Testing without any notes at all. This is only present as
> > a baseline, and to verify that the notes code does not negatively
> > affect performance when not in use.
> 
> Minor nit.
> 
> For this to be a baseline, you would need to have another algorithm
>  before "before", i.e., without any of these notes implementation.
> 
> Comparison with "before" alone is not meaningful.  That is like starting
> with a state with unknown performance regression compared to the stock
> version, and then boast improvements made by various variations.
> 
> You would need to compare overhead of various "algorithms" with the stock
> git in "no-notes" case as well.  It would give us the true performance
> cost of supporting notes.

True. Here is the same table with the baseline ('next') entry on top:


Algorithm / Notes tree   git log -n10 (x100)   git log --all
------------------------------------------------------------
next / no-notes                4.77s              63.84s

before / no-notes              4.78s              63.90s
before / no-fanout            56.85s              65.69s

16tree / no-notes              4.77s              64.18s
16tree / no-fanout            30.35s              65.39s
16tree / 2_38                  5.57s              65.42s
16tree / 2_2_36                5.19s              65.76s

flexible / no-notes            4.78s              63.91s
flexible / no-fanout          30.34s              65.57s
flexible / 2_38                5.57s              65.46s
flexible / 2_2_36              5.18s              65.72s
flexible / ym                  5.13s              65.66s
flexible / ym_2_38             5.08s              65.63s
flexible / ymd                 5.30s              65.45s
flexible / ymd_2_38            5.29s              65.90s
flexible / y_m                 5.11s              65.72s
flexible / y_m_2_38            5.08s              65.67s
flexible / y_m_d               5.06s              65.50s
flexible / y_m_d_2_38          5.07s              65.79s


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  8:54     ` Johan Herland
@ 2009-09-08  9:32       ` Johannes Schindelin
  2009-09-08 12:36         ` Johan Herland
  2009-09-08 20:31         ` Junio C Hamano
  0 siblings, 2 replies; 58+ messages in thread
From: Johannes Schindelin @ 2009-09-08  9:32 UTC (permalink / raw)
  To: Johan Herland
  Cc: Junio C Hamano, git, trast, tavestbo, git, chriscool, spearce

Hi,

On Tue, 8 Sep 2009, Johan Herland wrote:

> Algorithm / Notes tree   git log -n10 (x100)   git log --all
> ------------------------------------------------------------
> next / no-notes                4.77s              63.84s
> 
> before / no-notes              4.78s              63.90s
> before / no-fanout            56.85s              65.69s
> 
> 16tree / no-notes              4.77s              64.18s
> 16tree / no-fanout            30.35s              65.39s
> 16tree / 2_38                  5.57s              65.42s
> 16tree / 2_2_36                5.19s              65.76s
> 
> flexible / no-notes            4.78s              63.91s
> flexible / no-fanout          30.34s              65.57s
> flexible / 2_38                5.57s              65.46s
> flexible / 2_2_36              5.18s              65.72s
> flexible / ym                  5.13s              65.66s
> flexible / ym_2_38             5.08s              65.63s
> flexible / ymd                 5.30s              65.45s
> flexible / ymd_2_38            5.29s              65.90s
> flexible / y_m                 5.11s              65.72s
> flexible / y_m_2_38            5.08s              65.67s
> flexible / y_m_d               5.06s              65.50s
> flexible / y_m_d_2_38          5.07s              65.79s

It's good to see that the no-notes behaves roughly like baseline.

I can see that some people may think that date-based fan-out is the cat's 
ass, but I have to warn that we have no idea how notes will be used, and 
the date-based fan-out is rather limiting in certain respects:

- for the typical nightly-build-generated notes, this fan-out is pretty 
  inefficient memory-wise.

- I find the restriction to commits rather limiting.

- most of the performance difference between the date-based and the SHA-1 
  based fan-out looks to me as if the issue was the top-level tree.  
  Basically, this tree has to be read _every_ time _anybody_ wants to read 
  a note.

  Maybe a finer-grained fan-out (finer than 16-bits) could help.  After 
  all, if you have 16 different notes, chances are that they have 16 
  different first letters, but all have the same commit year.  That's 
  where the top-level notes with a fan-out perform incredibly bad.

  But I think that having a dynamic fan-out that can even put blobs into 
  the top-level tree (nothing prevents us from doing that, right?) would 
  _outperform_ the date-based one, at least with less than 1 note/commit 
  (and maybe even then, because the year-based fan-out results in pretty 
  varying entropies per fan-out depth).

  The real question for me, therefore, is: what is the optimal way to 
  strike the balance between size of the tree objects (which we want to 
  be small, so that unpacking them is fast)  and depth of the fan-out 
  (which we want to be shallow to avoid reading worst-case 39 tree objects 
  to get at one single note).

- related to the previous point is my gut feeling that the date-based 
  fan-out has nothing to do with any theoretical optimum.  I am pretty 
  certain that the optimal fan-out strategy depends heavily on the SHA-1s 
  of the annotated objects (if you have 10,000 notes in 2009, but only 1 
  in 2008, the year-based fan-out _must_ be suboptimal)  and maybe is 
  something like a sibling to the Fibonacci heap.

- I'd love to see performance numbers for less than 157118 notes.  Don't 
  get me wrong, it is good to see the worst-case scenario in terms of 
  notes/commits ratio.  But it will hardly be the common case, and I 
  very much would like to optimize for the common case.

  So, I'd appreciate if you could do the tests with something like 500 
  notes, randomly spread over the commits (rationale: my original 
  understanding was that the notes could amend commit messages, and that 
  is much more likely to be done with relatively old commits that you 
  cannot change anymore).

Please understand that I might not have the time to participate in this 
thread as much as I would like to.  The next 4 days will be especially 
hard.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  9:32       ` Johannes Schindelin
@ 2009-09-08 12:36         ` Johan Herland
  2009-09-08 15:53           ` Johannes Schindelin
  2009-09-10  9:25           ` Johan Herland
  2009-09-08 20:31         ` Junio C Hamano
  1 sibling, 2 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08 12:36 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, trast, tavestbo, git, chriscool, spearce

Hi,

On Tuesday 08 September 2009, Johannes Schindelin wrote:
> On Tue, 8 Sep 2009, Johan Herland wrote:
> > Algorithm / Notes tree   git log -n10 (x100)   git log --all
> > ------------------------------------------------------------
> > next / no-notes                4.77s              63.84s
> >
> > before / no-notes              4.78s              63.90s
> > before / no-fanout            56.85s              65.69s
> >
> > 16tree / no-notes              4.77s              64.18s
> > 16tree / no-fanout            30.35s              65.39s
> > 16tree / 2_38                  5.57s              65.42s
> > 16tree / 2_2_36                5.19s              65.76s
> >
> > flexible / no-notes            4.78s              63.91s
> > flexible / no-fanout          30.34s              65.57s
> > flexible / 2_38                5.57s              65.46s
> > flexible / 2_2_36              5.18s              65.72s
> > flexible / ym                  5.13s              65.66s
> > flexible / ym_2_38             5.08s              65.63s
> > flexible / ymd                 5.30s              65.45s
> > flexible / ymd_2_38            5.29s              65.90s
> > flexible / y_m                 5.11s              65.72s
> > flexible / y_m_2_38            5.08s              65.67s
> > flexible / y_m_d               5.06s              65.50s
> > flexible / y_m_d_2_38          5.07s              65.79s
>
> It's good to see that the no-notes behaves roughly like baseline.

Agreed.

> I can see that some people may think that date-based fan-out is the
> cat's ass, but I have to warn that we have no idea how notes will be
> used,

I don't agree. Although we will certainly see many more use cases for 
notes, I believe that the vast majority of them can be placed in one of 
two categories:

1. Looking up a few (1 - 10) notes on an individual basis. In this case, 
performance will always be "good enough", and I don't believe it's 
worth spending much time optimizing for this case.

2. Looking up many notes in a sequence based on the chronology of the 
objects (commits) they annotate. This is what 'git log' does, and I 
believe this is the case we should optimize for.

Once you work from this assumption, it is clear that date-based fanout 
beats pure SHA1-based fanout, because it changes the notes lookup 
process from random-access to near-streaming. This is clearly reflected 
in the memory usage graphs I posted.

Also note that the "flexible" code to some degree resolves the whole 
date-based fanout vs. SHA1-based fanout discussion: We are now free to 
choose -- at runtime -- which notes tree structure is more optimal for 
a given collection of notes.

> and the date-based fan-out is rather limiting in certain respects:
>
> - for the typical nightly-build-generated notes, this fan-out is
> pretty inefficient memory-wise.

Yes and no. A y/m/d/40 structure with 1 note for every y/m/d is indeed 
wasteful, but using a y/40 structure instead creates a much better 
situation with a "healthy" ~365 notes per year level. And the y/40 
still preserves some of the 'streaming' aspects mentioned above.

> - I find the restriction to commits rather limiting.

I see your point, but I don't agree until I see a compelling case for 
annotating a non-commit.

In any case, the flexible lookup code still allow us to organize notes 
purely by SHA1-based fanout, so annotated trees/blobs could still be 
supported by the same code (modulo a small s/struct commit/struct 
object/ patch) provided that they are stored in a notes tree that 
simply does not use date-based fanout.

> - most of the performance difference between the date-based and the
> SHA-1 based fan-out looks to me as if the issue was the top-level
> tree. Basically, this tree has to be read _every_ time _anybody_
> wants to read a note.

Not sure what you're trying to say here. The top-level notes tree is 
read (as in fill_tree_descriptor()) exactly _once_. After that, it is 
cached by the internal data structure (until free_commit_notes() or 
end-of-process).

In the SHA1-based case, each notes lookup does indeed start at the root 
of the data structure (corresponding to the top-level tree), but in the 
date-based case, we keep a pointer (cur_node) to the innermost 
date-based tree that was most recently used. Thus, if the next note is 
within the same date-based tree (which I assume is the common case), 
then we don't need to look at the root of the data structure.

>   Maybe a finer-grained fan-out (finer than 16-bits) could help. 
> After all, if you have 16 different notes, chances are that they have
> 16 different first letters, but all have the same commit year. 
> That's where the top-level notes with a fan-out perform incredibly
> bad.

Not really, the first lookup would start at the root node, and navigate 
into the year node, but all subsequent lookups would start directly at 
the year node (and only backtrack if the commit year doesn't match the 
year node).

BTW, when you mention "finer than 16-bits", do you mean moving from a 
16-tree to a, say, 32-tree or 64-tree structure? That will complicate 
the tree navigation somewhat, and increase memory waste. (Remember, I 
started out with a 256-tree idea, but scrapped it because of memory 
waste).

>   But I think that having a dynamic fan-out that can even put blobs
> into the top-level tree (nothing prevents us from doing that, right?)

Well, the "flexible" code does add the new requirement that all entries 
in a notes (sub)tree object must follow the same scheme, i.e. you 
cannot have:

  /12/34567890123456789012345678901234567890
  /2345/678901234567890123456789012345678901

but you can have

  /12/34567890123456789012345678901234567890
  /23/45/678901234567890123456789012345678901

> would _outperform_ the date-based one, at least with less than 1
> note/commit (and maybe even then, because the year-based fan-out
> results in pretty varying entropies per fan-out depth).
>
>   The real question for me, therefore, is: what is the optimal way to
>   strike the balance between size of the tree objects (which we want
> to be small, so that unpacking them is fast)  and depth of the
> fan-out (which we want to be shallow to avoid reading worst-case 39
> tree objects to get at one single note).

s/39/19/ (each fanout must use at least 2 chars of the 40-char SHA1)

Yes, the challenge is indeed striking the correct balance. I believe 
that the notes code should be taught to write (and automatically 
re-organize) the notes tree so that it is optimized for the current 
collection of notes.

> - related to the previous point is my gut feeling that the date-based
>   fan-out has nothing to do with any theoretical optimum.  I am
> pretty certain that the optimal fan-out strategy depends heavily on
> the SHA-1s of the annotated objects (if you have 10,000 notes in
> 2009, but only 1 in 2008, the year-based fan-out _must_ be
> suboptimal)  and maybe is something like a sibling to the Fibonacci
> heap.

Yes, it is trivial to create scenarios where any rigid date-based fanout 
is suboptimal. However, it is equally trivial to create scenarios where 
any SHA1-based fanout will perform worse than a carefully chosen 
date-based fanout. I believe the best way forward is to design for 
flexibility in the notes tree structure, and then teach the notes 
_writing_ code to choose a notes tree structure that is good/optimal 
for the given set of notes.

> - I'd love to see performance numbers for less than 157118 notes. 
> Don't get me wrong, it is good to see the worst-case scenario in
> terms of notes/commits ratio.  But it will hardly be the common case,
> and I very much would like to optimize for the common case.
>
>   So, I'd appreciate if you could do the tests with something like
> 500 notes, randomly spread over the commits (rationale: my original
> understanding was that the notes could amend commit messages, and
> that is much more likely to be done with relatively old commits that
> you cannot change anymore).

Ok. I will try to test that.

> Please understand that I might not have the time to participate in
> this thread as much as I would like to.  The next 4 days will be
> especially hard.

Thanks for the feedback! I appreciate any time you're able to spend on 
this. And I don't mind waiting for a few days for more feedback.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 12:36         ` Johan Herland
@ 2009-09-08 15:53           ` Johannes Schindelin
  2009-09-08 22:46             ` Johan Herland
  2009-09-10  9:25           ` Johan Herland
  1 sibling, 1 reply; 58+ messages in thread
From: Johannes Schindelin @ 2009-09-08 15:53 UTC (permalink / raw)
  To: Johan Herland
  Cc: git, Junio C Hamano, trast, tavestbo, git, chriscool, spearce

Hi,

On Tue, 8 Sep 2009, Johan Herland wrote:

> On Tuesday 08 September 2009, Johannes Schindelin wrote:

> > I can see that some people may think that date-based fan-out is the 
> > cat's ass, but I have to warn that we have no idea how notes will be 
> > used,
> 
> I don't agree. Although we will certainly see many more use cases for 
> notes, I believe that the vast majority of them can be placed in one of 
> two categories:

My experience with Git is that having beliefs how my work is used was a 
constant source of surprise.

> > - I find the restriction to commits rather limiting.
> 
> I see your point, but I don't agree until I see a compelling case for 
> annotating a non-commit.

My point is that it is too late by then, if you don't allow for a flexible 
and still efficient scheme.

> > - most of the performance difference between the date-based and the 
> >   SHA-1 based fan-out looks to me as if the issue was the top-level 
> >   tree. Basically, this tree has to be read _every_ time _anybody_ 
> >   wants to read a note.
> 
> Not sure what you're trying to say here. The top-level notes tree is 
> read (as in fill_tree_descriptor()) exactly _once_. After that, it is 
> cached by the internal data structure (until free_commit_notes() or 
> end-of-process).

By that reasoning, we do not need any fan-out scheme.

Keep in mind: reading a large tree object takes a long time.  That's why 
we started fan-out.  Reading a large number of tree objects also takes a 
long time.  That's why I propagated flexible fan-out that is only read-in 
on demand.

> > But I think that having a dynamic fan-out that can even put blobs into 
> > the top-level tree (nothing prevents us from doing that, right?)
> 
> Well, the "flexible" code does add the new requirement that all entries 
> in a notes (sub)tree object must follow the same scheme, i.e. you 
> cannot have:
> 
>   /12/34567890123456789012345678901234567890
>   /2345/678901234567890123456789012345678901
> 
> but you can have
> 
>   /12/34567890123456789012345678901234567890
>   /23/45/678901234567890123456789012345678901

Umm, why?  Is there any good technical reason?

> > The real question for me, therefore, is: what is the optimal way to 
> > strike the balance between size of the tree objects (which we want to 
> > be small, so that unpacking them is fast)  and depth of the fan-out 
> > (which we want to be shallow to avoid reading worst-case 39 tree 
> > objects to get at one single note).
> 
> s/39/19/ (each fanout must use at least 2 chars of the 40-char SHA1)

That is another unnecessary restriction that could cost you dearly.  Just 
think what happens if it turns out that the optimal number of tree items 
is closer to 16 than to 255...

> Yes, the challenge is indeed striking the correct balance. I believe 
> that the notes code should be taught to write (and automatically 
> re-organize) the notes tree so that it is optimized for the current 
> collection of notes.

Of course!  I never thought that the user should be allowed to make the 
choice how to organize the notes.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  9:32       ` Johannes Schindelin
  2009-09-08 12:36         ` Johan Herland
@ 2009-09-08 20:31         ` Junio C Hamano
  2009-09-08 21:10           ` Shawn O. Pearce
  2009-09-08 21:40           ` Johan Herland
  1 sibling, 2 replies; 58+ messages in thread
From: Junio C Hamano @ 2009-09-08 20:31 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johan Herland, git, trast, tavestbo, git, chriscool, spearce

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Tue, 8 Sep 2009, Johan Herland wrote:
>
>> Algorithm / Notes tree   git log -n10 (x100)   git log --all
>> ------------------------------------------------------------
>> next / no-notes                4.77s              63.84s
>> 
>> before / no-notes              4.78s              63.90s
>> before / no-fanout            56.85s              65.69s
>> 
>> 16tree / no-notes              4.77s              64.18s
>> 16tree / no-fanout            30.35s              65.39s
>> 16tree / 2_38                  5.57s              65.42s
>> 16tree / 2_2_36                5.19s              65.76s
>> 
>> flexible / no-notes            4.78s              63.91s
>> flexible / no-fanout          30.34s              65.57s
>> flexible / 2_38                5.57s              65.46s
>> flexible / 2_2_36              5.18s              65.72s
>> flexible / ym                  5.13s              65.66s
>> flexible / ym_2_38             5.08s              65.63s
>> flexible / ymd                 5.30s              65.45s
>> flexible / ymd_2_38            5.29s              65.90s
>> flexible / y_m                 5.11s              65.72s
>> flexible / y_m_2_38            5.08s              65.67s
>> flexible / y_m_d               5.06s              65.50s
>> flexible / y_m_d_2_38          5.07s              65.79s
>
> It's good to see that the no-notes behaves roughly like baseline.
>
> I can see that some people may think that date-based fan-out is the cat's 
> ass,

Actually, my knee-jerk reaction was that 4.77 (next) vs 5.57 (16tree with
2_38) is already a good enough performance/simplicity tradeoff, and 5.57
vs 5.08 (16tree with ym_2_38) probably does not justify the risk of worst
case behaviour that can come from possible mismatch between the access
pattern and the date-optimized tree layout.

But that only argues against supporting _only_ date-optimized layout.

Support of "flexible layout" is not that flexible as its name suggests;
one single note tree needs to have a uniform fanout strategy.  But it is
not unusably rigid either; you only need to be extra careful when merging
two notes trees.  We can leave the heuristics to choose what the optimum
layout to later rounds.

> - I find the restriction to commits rather limiting.

Yeah, we would not want to be surprised to find many people want to
annotate non-commits with this mechanism.

> - most of the performance difference between the date-based and the SHA-1 
>   based fan-out looks to me as if the issue was the top-level tree.  
>   Basically, this tree has to be read _every_ time _anybody_ wants to read 
>   a note.

A comparison between 'next' and another algorithm that opens the top-level
notes tree object and returns "I did not find any note" without doing
anything else would reveal that cost.  But when you are doing "log -n10"
(or "log --all"), you would read the notes top-level tree once, and it is
likely to be cached in the obj_hash[] (or in delta_base cache) already for
the remaining invocations, even if notes mechanism does not do its own
cache, which I think it does, no?

> - I'd love to see performance numbers for less than 157118 notes.  Don't 
>   get me wrong, it is good to see the worst-case scenario in terms of 
>   notes/commits ratio.  But it will hardly be the common case, and I 
>   very much would like to optimize for the common case.
>
>   So, I'd appreciate if you could do the tests with something like 500 
>   notes, randomly spread over the commits (rationale: my original 
>   understanding was that the notes could amend commit messages, and that 
>   is much more likely to be done with relatively old commits that you 
>   cannot change anymore).

Hmph, is that a typical use case?  How does it relate to CC's object
replacement mechanism?

Also Gitney talked about annotating commits in the code-review thing.
What's the expected notes density and distribution in that application?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 20:31         ` Junio C Hamano
@ 2009-09-08 21:10           ` Shawn O. Pearce
  2009-09-08 21:36             ` Sverre Rabbelier
  2009-09-08 21:40           ` Johan Herland
  1 sibling, 1 reply; 58+ messages in thread
From: Shawn O. Pearce @ 2009-09-08 21:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johan Herland, git, trast, tavestbo, git, chriscool

Junio C Hamano <gitster@pobox.com> wrote:
> Also Gitney talked about annotating commits in the code-review thing.
> What's the expected notes density and distribution in that application?

Uh, try one note per commit in a project.  A few merges won't need
a note, but nearly every single non-merge commit would.

Consider a project with a velocity of about 200 non-merge
commits/day; the object count goes up fast.

One idea we are starting to kick around might double or quadruple
that number.  If we store metadata about every version of every
commit ever proposed to a project, we need a lot more notes than
commits.  Right now we have this sort of distribution from one of
our servers:

versions | commits 
---------+---------
       1 |    9262
       2 |    2626
       3 |    1053
       4 |     424
       5 |     224
       6 |     124
       7 |      57
       8 |      38
       9 |      28
      10 |      14
      11 |      12
      12 |      10
      13 |       5
      14 |       6
      15 |       2
      16 |       3
      17 |       2
      21 |       1
      32 |       1

So most commits (66%) would have only 1 version (and 1 note)
related to them in the note tree, but if I use the same note tree
for final commits as individual revisions considered, at least 18%
of the commits in the final history of the project would actually
have two notes, and 7.5% would have 3 notes.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 21:10           ` Shawn O. Pearce
@ 2009-09-08 21:36             ` Sverre Rabbelier
  2009-09-08 21:39               ` Shawn O. Pearce
  0 siblings, 1 reply; 58+ messages in thread
From: Sverre Rabbelier @ 2009-09-08 21:36 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Junio C Hamano, Johannes Schindelin, Johan Herland, git, trast,
	tavestbo, git, chriscool

Heya,

On Tue, Sep 8, 2009 at 23:10, Shawn O. Pearce<spearce@spearce.org> wrote:
> So most commits (66%) would have only 1 version (and 1 note)
> related to them in the note tree, but if I use the same note tree
> for final commits as individual revisions considered, at least 18%
> of the commits in the final history of the project would actually
> have two notes, and 7.5% would have 3 notes.

You could however store all that information in one note, yes? Since
the 'latest version' is the one committed, you can include the notes
for all the previous ones at commit time?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 21:36             ` Sverre Rabbelier
@ 2009-09-08 21:39               ` Shawn O. Pearce
  2009-09-08 21:57                 ` Sverre Rabbelier
  0 siblings, 1 reply; 58+ messages in thread
From: Shawn O. Pearce @ 2009-09-08 21:39 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Junio C Hamano, Johannes Schindelin, Johan Herland, git, trast,
	tavestbo, git, chriscool

Sverre Rabbelier <srabbelier@gmail.com> wrote:
> On Tue, Sep 8, 2009 at 23:10, Shawn O. Pearce<spearce@spearce.org> wrote:
> > So most commits (66%) would have only 1 version (and 1 note)
> > related to them in the note tree, but if I use the same note tree
> > for final commits as individual revisions considered, at least 18%
> > of the commits in the final history of the project would actually
> > have two notes, and 7.5% would have 3 notes.
> 
> You could however store all that information in one note, yes? Since
> the 'latest version' is the one committed, you can include the notes
> for all the previous ones at commit time?

Uh, but the natural way to index those is by commit, and each
different revision of a change is a different commit.  Why delete
the prior revision information and move it to the final commit note?
Someone who has the prior revisions in their reflog and is doing
`git log -g --notes` might want to see that annotation.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 20:31         ` Junio C Hamano
  2009-09-08 21:10           ` Shawn O. Pearce
@ 2009-09-08 21:40           ` Johan Herland
  1 sibling, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-08 21:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, trast, tavestbo, git, chriscool, spearce

On Tuesday 08 September 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > On Tue, 8 Sep 2009, Johan Herland wrote:
> >> Algorithm / Notes tree   git log -n10 (x100)   git log --all
> >> ------------------------------------------------------------
> >> next / no-notes                4.77s              63.84s
> >>
> >> before / no-notes              4.78s              63.90s
> >> before / no-fanout            56.85s              65.69s
> >>
> >> 16tree / no-notes              4.77s              64.18s
> >> 16tree / no-fanout            30.35s              65.39s
> >> 16tree / 2_38                  5.57s              65.42s
> >> 16tree / 2_2_36                5.19s              65.76s
> >>
> >> flexible / no-notes            4.78s              63.91s
> >> flexible / no-fanout          30.34s              65.57s
> >> flexible / 2_38                5.57s              65.46s
> >> flexible / 2_2_36              5.18s              65.72s
> >> flexible / ym                  5.13s              65.66s
> >> flexible / ym_2_38             5.08s              65.63s
> >> flexible / ymd                 5.30s              65.45s
> >> flexible / ymd_2_38            5.29s              65.90s
> >> flexible / y_m                 5.11s              65.72s
> >> flexible / y_m_2_38            5.08s              65.67s
> >> flexible / y_m_d               5.06s              65.50s
> >> flexible / y_m_d_2_38          5.07s              65.79s
> >
> > I can see that some people may think that date-based fan-out is the
> > cat's ass,
> 
> Actually, my knee-jerk reaction was that 4.77 (next) vs 5.57 (16tree with
> 2_38) is already a good enough performance/simplicity tradeoff, and 5.57
> vs 5.08 (16tree with ym_2_38) probably does not justify the risk of worst
> case behaviour that can come from possible mismatch between the access
> pattern and the date-optimized tree layout.

Yes, 16tree / 2_38 looks like a reasonable tradeoff when you look at the 
absolute numbers, but it's also interesting to highlight the actual cost of 
doing the notes lookup. In that case, we see that 16tree / 2_38 costs 0.80s, 
whereas flexible / ym_2_38 only costs 0.31s, i.e. less than half the cost of 
the former...

> But that only argues against supporting _only_ date-optimized layout.
> 
> Support of "flexible layout" is not that flexible as its name suggests;
> one single note tree needs to have a uniform fanout strategy.

Actually, the uniform strategy is only required at each separate level. You 
are free to vary the strategy within independent subtrees. I.e. in the case 
where you have 1 note from 2007, and 1000 notes from 2008, you are free to 
use a mix of date-based and SHA1-based structures, like this:

  y2007/1234567...
  y2008/m01/d01/2345678...
  y2008/m01/d01/3456789...
  y2008/m01/d02/45/67890...
  y2008/m01/d02/56/78901...
  y2008/m01/d02/67/89012...
  ...

> > - I find the restriction to commits rather limiting.
> 
> Yeah, we would not want to be surprised to find many people want to
> annotate non-commits with this mechanism.

We could arbitrarily set the "commit date" for non-commit objects to the 
epoch, so that they can still be represented in a date-based fanout. (Of 
course, the notes code should be smart enough to choose a more optimal 
fanout if the number of non-commit notes is significant).

> > - most of the performance difference between the date-based and the
> > SHA-1 based fan-out looks to me as if the issue was the top-level tree.
> > Basically, this tree has to be read _every_ time _anybody_ wants to
> > read a note.
> 
> A comparison between 'next' and another algorithm that opens the
>  top-level notes tree object and returns "I did not find any note"
>  without doing anything else would reveal that cost.  But when you are
>  doing "log -n10" (or "log --all"), you would read the notes top-level
>  tree once, and it is likely to be cached in the obj_hash[] (or in
>  delta_base cache) already for the remaining invocations, even if notes
>  mechanism does not do its own cache, which I think it does, no?

Yes it does, since Dscho's original hash_map based implementation, in fact.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 21:39               ` Shawn O. Pearce
@ 2009-09-08 21:57                 ` Sverre Rabbelier
  0 siblings, 0 replies; 58+ messages in thread
From: Sverre Rabbelier @ 2009-09-08 21:57 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Junio C Hamano, Johannes Schindelin, Johan Herland, git, trast,
	tavestbo, git, chriscool

Heya,

On Tue, Sep 8, 2009 at 23:39, Shawn O. Pearce<spearce@spearce.org> wrote:
> Uh, but the natural way to index those is by commit, and each
> different revision of a change is a different commit.  Why delete
> the prior revision information and move it to the final commit note?

Ah, I didn't realize you would push the notes before the final revision is made.

> Someone who has the prior revisions in their reflog and is doing
> `git log -g --notes` might want to see that annotation.

It would make more sense to have multiple notes then, but wouldn't you
want them to annotate the original commit, rather than the final one?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 15:53           ` Johannes Schindelin
@ 2009-09-08 22:46             ` Johan Herland
  2009-09-10  6:23               ` Stephen R. van den Berg
  0 siblings, 1 reply; 58+ messages in thread
From: Johan Herland @ 2009-09-08 22:46 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, trast, tavestbo, git, chriscool, spearce

On Tuesday 08 September 2009, Johannes Schindelin wrote:
> On Tue, 8 Sep 2009, Johan Herland wrote:
> > On Tuesday 08 September 2009, Johannes Schindelin wrote:
> > > I can see that some people may think that date-based fan-out is the
> > > cat's ass, but I have to warn that we have no idea how notes will be
> > > used,
> >
> > I don't agree. Although we will certainly see many more use cases for
> > notes, I believe that the vast majority of them can be placed in one of
> > two categories:
> 
> My experience with Git is that having beliefs how my work is used was a
> constant source of surprise.

And you believe that a system that only allows SHA1-based fanout schemes is 
better equipped to tackle such surprises than a system that provides both 
date-based and SHA1-based fanout schemes?

> > > - I find the restriction to commits rather limiting.
> >
> > I see your point, but I don't agree until I see a compelling case for
> > annotating a non-commit.
> 
> My point is that it is too late by then, if you don't allow for a
>  flexible and still efficient scheme.

As I replied to Junio, we could use the epoch as a "commit date" for tree 
and blob objects, thus making them representable in a date-based fanout 
scheme (although if there are a signficant number of non-commit notes, the 
code should be smart enough to find a better (SHA1-based probably) fanout 
scheme for those notes).

> > > - most of the performance difference between the date-based and the
> > >   SHA-1 based fan-out looks to me as if the issue was the top-level
> > >   tree. Basically, this tree has to be read _every_ time _anybody_
> > >   wants to read a note.
> >
> > Not sure what you're trying to say here. The top-level notes tree is
> > read (as in fill_tree_descriptor()) exactly _once_. After that, it is
> > cached by the internal data structure (until free_commit_notes() or
> > end-of-process).
> 
> By that reasoning, we do not need any fan-out scheme.
> 
> Keep in mind: reading a large tree object takes a long time.  That's why
> we started fan-out.  Reading a large number of tree objects also takes a
> long time.  That's why I propagated flexible fan-out that is only read-in
> on demand.

Not sure where you're going with this. Of course we want to strike an 
optimal balance between the size of tree objects and the number of tree 
objects. Nobody is arguing about that. Both SHA1-based and (in the most 
common cases) date-based schemes can be used to achieve this balance. But 
using date-based fanout has the added advantage of providing better 
performance (both runtime- and memory-wise) when looking up notes in a 
chronological order.

> > > But I think that having a dynamic fan-out that can even put blobs
> > > into the top-level tree (nothing prevents us from doing that, right?)
> >
> > Well, the "flexible" code does add the new requirement that all entries
> > in a notes (sub)tree object must follow the same scheme, i.e. you
> > cannot have:
> >
> >   /12/34567890123456789012345678901234567890
> >   /2345/678901234567890123456789012345678901
> >
> > but you can have
> >
> >   /12/34567890123456789012345678901234567890
> >   /23/45/678901234567890123456789012345678901
> 
> Umm, why?  Is there any good technical reason?

In the date-based parts of notes tree, there are very good reasons for doing 
so: The code peeks at the first tree entry in order to determine what kind 
of date-based fanout is used in the current tree object. Subsequent entries 
(in that tree object) that do not follow the same format are 
skipped/ignored.

The SHA1-based fanout code has not changed since the last iteration, so this 
extra requirement is not absolutely necessary for the SHA1-based parts of 
the notes tree. However, the extra requirement does guarantee that commit 
notes have exactly one unique location in the notes tree, and thus relieves 
us of having to keep searching for alternative notes locations, and 
concatenate the notes found.

> > > The real question for me, therefore, is: what is the optimal way to
> > > strike the balance between size of the tree objects (which we want to
> > > be small, so that unpacking them is fast)  and depth of the fan-out
> > > (which we want to be shallow to avoid reading worst-case 39 tree
> > > objects to get at one single note).
> >
> > s/39/19/ (each fanout must use at least 2 chars of the 40-char SHA1)
> 
> That is another unnecessary restriction that could cost you dearly.  Just
> think what happens if it turns out that the optimal number of tree items
> is closer to 16 than to 255...

The code can easily be rewritten to allow for "odd" fanouts (1/39, 1/1/38, 
etc.). Feel free to submit a patch.

I was, however, naive enough to assume that when git.git decided on using 
2/38 fanout for its loose objects, then some performance-related thoughts
went into that decision. If there are indications that multiple-of-2-type 
fanouts are not optimal, we should probably reconsider.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 22:46             ` Johan Herland
@ 2009-09-10  6:23               ` Stephen R. van den Berg
  0 siblings, 0 replies; 58+ messages in thread
From: Stephen R. van den Berg @ 2009-09-10  6:23 UTC (permalink / raw)
  To: Johan Herland
  Cc: Johannes Schindelin, git, Junio C Hamano, trast, tavestbo, git,
	chriscool, spearce

Johan Herland wrote:
>I was, however, naive enough to assume that when git.git decided on using 
>2/38 fanout for its loose objects, then some performance-related thoughts
>went into that decision. If there are indications that multiple-of-2-type 

I presume that that choice was based on the fact that on a typical filesystem
(e.g. ext2 old-style) without directory indexing, the sort-of optimal fanout
should aim for no more than 100-200 directory entries per directory.
-- 
Sincerely,
           Stephen R. van den Berg.

Mommy, what happens to your files when you die?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08 12:36         ` Johan Herland
  2009-09-08 15:53           ` Johannes Schindelin
@ 2009-09-10  9:25           ` Johan Herland
  1 sibling, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-10  9:25 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, trast, tavestbo, git, chriscool, spearce

On Tuesday 08 September 2009, Johan Herland wrote:
> On Tuesday 08 September 2009, Johannes Schindelin wrote:
> > On Tue, 8 Sep 2009, Johan Herland wrote:
> > > Algorithm / Notes tree   git log -n10 (x100)   git log --all
> > > ------------------------------------------------------------
> > > next / no-notes                4.77s              63.84s
> > >
> > > before / no-notes              4.78s              63.90s
> > > before / no-fanout            56.85s              65.69s
> > >
> > > 16tree / no-notes              4.77s              64.18s
> > > 16tree / no-fanout            30.35s              65.39s
> > > 16tree / 2_38                  5.57s              65.42s
> > > 16tree / 2_2_36                5.19s              65.76s
> > >
> > > flexible / no-notes            4.78s              63.91s
> > > flexible / no-fanout          30.34s              65.57s
> > > flexible / 2_38                5.57s              65.46s
> > > flexible / 2_2_36              5.18s              65.72s
> > > flexible / ym                  5.13s              65.66s
> > > flexible / ym_2_38             5.08s              65.63s
> > > flexible / ymd                 5.30s              65.45s
> > > flexible / ymd_2_38            5.29s              65.90s
> > > flexible / y_m                 5.11s              65.72s
> > > flexible / y_m_2_38            5.08s              65.67s
> > > flexible / y_m_d               5.06s              65.50s
> > > flexible / y_m_d_2_38          5.07s              65.79s

[snip]

> > - I'd love to see performance numbers for less than 157118 notes.
> > Don't get me wrong, it is good to see the worst-case scenario in
> > terms of notes/commits ratio.  But it will hardly be the common case,
> > and I very much would like to optimize for the common case.
> >
> >   So, I'd appreciate if you could do the tests with something like
> > 500 notes, randomly spread over the commits (rationale: my original
> > understanding was that the notes could amend commit messages, and
> > that is much more likely to be done with relatively old commits that
> > you cannot change anymore).
> 
> Ok. I will try to test that.

Here are the results of the 500-notes-in-kernel-repo test:

Algorithm / Notes tree   git log -n10 (x100)   git log --all

next / no-notes                 4.83s             64.78s

before / no-notes               4.84s             64.76s
before / no-fanout              4.98s             64.89s

16tree / no-notes               4.84s             64.61s
16tree / no-fanout              4.92s             64.68s
16tree / 2_38                   4.85s             64.45s
16tree / 2_2_36                 4.85s             64.63s

flexible / no-notes             4.84s             64.82s
flexible / no-fanout            4.91s             65.01s
flexible / 2_38                 4.85s             64.93s
flexible / 2_2_36               4.85s             64.63s
flexible / ym                   4.83s             64.63s
flexible / ym_2_38              4.86s             64.72s
flexible / ymd                  4.91s             64.74s
flexible / ymd_2_38             4.91s             64.56s
flexible / y_m                  4.86s             64.76s
flexible / y_m_2_38             4.86s             64.71s
flexible / y_m_d                4.86s             64.73s
flexible / y_m_d_2_38           4.84s             64.50s

I don't like the noise level in the second column ('git log --all'). Then 
again, I don't find that column very interesting (it's mostly there to 
verify that we don't have any abysmal worst-case behaviours in the notes 
code).

The first column is fairly nice and tidy, though. At a first glance it shows 
pretty much the same results as the 157000-notes table previously posted. 
Obviously the abysmal performance of no-fanout is gone (500 notes in a 
single tree object is not _that_ bad), although a 2/38-fanout is still a 
better choice for 500 notes (but 2/2/36 does not provide any additional 
improvement).

>From this we can start to guess that the threshold for moving from no fanout 
to 2/38 is somewhere below 500 notes, while the theshold for moving from 
2/38 to 2/2/36 is between 500 and ~157000 notes (probably much closer to 
157000 than to 500; I wouldn't be surprised if ~256 entries per level turns 
out to be good a threshold).

The date-based fanout performs on par with the SHA1-based fanout, although 
it's hard to say anything conclusively when the numbers are as close as 
this. However, the ymd and ymd_2_38 fanout probably show signs of too much 
overhead (too many levels) at only 500 notes. This is not surprising.

My gut feeling tells me that moving from 'no-fanout' to either '2_38' or 
'ym' is a good idea at ~256 notes. Then, if we went with '2_38', we'd have 
to switch to '2_2_36' at ~64K notes (i.e. when each /38 level reaches ~256 
notes) However, it seems that with 'ym', we could stick with it for much 
longer before having to consider switching to a different fanout alternative 
(probably 'ym_2_38' or 'y_m_d').


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (14 preceding siblings ...)
  2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
@ 2009-09-10 14:00 ` Geert Bosch
  2009-09-10 14:09   ` Michael J Gruber
  2009-09-12  0:11 ` Junio C Hamano
  16 siblings, 1 reply; 58+ messages in thread
From: Geert Bosch @ 2009-09-10 14:00 UTC (permalink / raw)
  To: Johan Herland
  Cc: gitster, git, Johannes.Schindelin, trast, tavestbo, git,
	chriscool, spearce

On Sep 7, 2009, at 22:26, Johan Herland wrote:
> Yet another iteration of the 'git notes' feature. Rebased on top of  
> 'next':
> - Patches 1-9 are unchanged from (patches 1-7, 11-12 of) the last  
> iteration.
> - Patch 10 teaches the notes code to free its data structures on  
> request.
> - Patch 11 introduces the 16-tree notes lookup code that handles  
> SHA1-based
>  fanout schemes. This is pretty much unchanged from patch 8 in the  
> previous
>  iteration.
> - Patch 12 adds selftests that verify correct parsing of notes trees  
> with
>  various SHA1-based fanouts.
> - Patch 13 introduces a flexible parser for a variety of date-based  
> and
>  SHA1-based fanout schemes. This is the interesting part, as far as  
> this
>  iteration is concerned.
> - Patch 14 adds selftests that verify correct parsing of notes trees  
> with
>  various date-based fanouts.
>
> Note that the series does not yet include code for _writing_ notes  
> into a
> suitably structured notes tree. That will be done in a later  
> iteration.
>
> I have some performance numbers that I will send in a separate email.

Hi Johan,

I've been following this series with some interest, and am curious
why notes need to be stored in a separate data structure from regular
objects. Note that I'm not questioning the design (and certainly would
not want to, this late in the process), rather I'd like to learn
about the reasons.

I've wondered about this as well in the context of refs, reflog and
git config. In a completely unified model, every change to the
repository (except  for the index, pack indices and working directory)
would be a  commit of the .git/ directory (again excluding indices).
One of the advantages (besides allowing configuration management
of the repository itself in addition to its contents) would be that
no locking is ever required.

This would be just an implementation detail without necessarily
affecting the user interface other than direct inspection/modification
of the .git directory, which is a similar to the move to packed refs.
Again, I'm not proposing to change anything, just wondering about
design rationale.

   -Geert

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-10 14:00 ` Geert Bosch
@ 2009-09-10 14:09   ` Michael J Gruber
  2009-09-10 14:12     ` Geert Bosch
  0 siblings, 1 reply; 58+ messages in thread
From: Michael J Gruber @ 2009-09-10 14:09 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Johan Herland, gitster, git, Johannes.Schindelin, trast,
	tavestbo, chriscool, spearce

Geert Bosch venit, vidit, dixit 10.09.2009 16:00:
> On Sep 7, 2009, at 22:26, Johan Herland wrote:
...
> 
> Hi Johan,
> 
> I've been following this series with some interest, and am curious
> why notes need to be stored in a separate data structure from regular
> objects. Note that I'm not questioning the design (and certainly would

It's not separate, that's the point. They're stored as objects in trees,
just like anything else. The discussion about the structure is about how
to organize the tree structure, not actual subdirectories under .git/.

> not want to, this late in the process), rather I'd like to learn
> about the reasons.
> 
> I've wondered about this as well in the context of refs, reflog and
> git config. In a completely unified model, every change to the
> repository (except  for the index, pack indices and working directory)
> would be a  commit of the .git/ directory (again excluding indices).
> One of the advantages (besides allowing configuration management
> of the repository itself in addition to its contents) would be that
> no locking is ever required.

...and one of the disadvantages that you're not in control of your
config any more, if you pull from upstream. config and reflog are
something inherently private and local. The reflog does not even make
sense other than in a local (per repo) context.

For the config, one may think up a solution where parts of config are
shared (by storing them as objects and referencing them) and git asks
you before changing anything on pull/fetch. In a sense git submodule
does that already.

Cheers,
Michael

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-10 14:09   ` Michael J Gruber
@ 2009-09-10 14:12     ` Geert Bosch
  0 siblings, 0 replies; 58+ messages in thread
From: Geert Bosch @ 2009-09-10 14:12 UTC (permalink / raw)
  To: Michael J Gruber
  Cc: Johan Herland, gitster, git, Johannes.Schindelin, trast,
	tavestbo, chriscool, spearce


On Sep 10, 2009, at 10:09, Michael J Gruber wrote:

> It's not separate, that's the point. They're stored as objects in  
> trees,
> just like anything else. The discussion about the structure is about  
> how
> to organize the tree structure, not actual subdirectories under .git/.

Arghh, sorry for the noise.

   -Geert

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
                   ` (15 preceding siblings ...)
  2009-09-10 14:00 ` Geert Bosch
@ 2009-09-12  0:11 ` Junio C Hamano
  2009-09-12 15:52   ` Johan Herland
  16 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2009-09-12  0:11 UTC (permalink / raw)
  To: Johan Herland
  Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool, spearce

Johan Herland <johan@herland.net> writes:

> Yet another iteration of the 'git notes' feature. Rebased on top of 'next':

By the way I didn't pick this up as it did not apply to any of my
branches.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
  2009-09-08  4:16   ` Junio C Hamano
@ 2009-09-12 15:50   ` Johan Herland
  2009-09-12 18:11     ` Shawn O. Pearce
  1 sibling, 1 reply; 58+ messages in thread
From: Johan Herland @ 2009-09-12 15:50 UTC (permalink / raw)
  To: git, spearce
  Cc: gitster, Johannes.Schindelin, trast, tavestbo, git, chriscool

On Tuesday 08 September 2009, Johan Herland wrote:
> Algorithm / Notes tree   git log -n10 (x100)   git log --all
> ------------------------------------------------------------
> before / no-notes              4.78s              63.90s
> before / no-fanout            56.85s              65.69s
> 
> 16tree / no-notes              4.77s              64.18s
> 16tree / no-fanout            30.35s              65.39s
> 16tree / 2_38                  5.57s              65.42s
> 16tree / 2_2_36                5.19s              65.76s
> 
> flexible / no-notes            4.78s              63.91s
> flexible / no-fanout          30.34s              65.57s
> flexible / 2_38                5.57s              65.46s
> flexible / 2_2_36              5.18s              65.72s
> flexible / ym                  5.13s              65.66s
> flexible / ym_2_38             5.08s              65.63s
> flexible / ymd                 5.30s              65.45s
> flexible / ymd_2_38            5.29s              65.90s
> flexible / y_m                 5.11s              65.72s
> flexible / y_m_2_38            5.08s              65.67s
> flexible / y_m_d               5.06s              65.50s
> flexible / y_m_d_2_38          5.07s              65.79s

Ok, I have been pondering this back and forth, and I'm not sure what to 
think. It seems allowing (not mandating) date-based fanout gives a slight 
runtime advantage if used correctly, but I'm not sure the slight runtime 
improvement is worth the added code complexity and worse maintainability. 
I'm starting to lean against SHA1-based fanout being "good enough".

But when we look at the memory consumption, it's clear that SHA1-based 
fanout loses out (because you cannot throw away subtrees without fear that 
they will be needed again soon). Then again, memory consumption has not been 
the major focus of the git project, and 14 MB (for holding all ~157000 notes 
in the kernel repo example) is not excessive for an average desktop 
computer.

Shawn, do you have any additional defence for the date-based fanout? Are 
there untested reasonable scenarios that would show the benefits of date-
based fanout? How does the plan for notes usage in your code-review thingy 
compare to my test scenario?


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-12  0:11 ` Junio C Hamano
@ 2009-09-12 15:52   ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 " Johan Herland
                       ` (14 more replies)
  0 siblings, 15 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 15:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool, spearce

On Saturday 12 September 2009, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Yet another iteration of the 'git notes' feature. Rebased on top of
> > 'next':
> 
> By the way I didn't pick this up as it did not apply to any of my
> branches.

Weird... I just rebased it from an old 'next' to current 'next' without any 
conflicts.

Will resend immediately.

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCHv6 00/14] git notes
  2009-09-12 15:52   ` Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 01/14] Introduce commit notes Johan Herland
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

The sixth iteration of the 'git notes' feature. Rebased on top of current
'next', otherwise unchanged from v5.


Have fun! :)

...Johan


Johan Herland (9):
  Teach "-m <msg>" and "-F <file>" to "git notes edit"
  fast-import: Add support for importing commit notes
  t3302-notes-index-expensive: Speed up create_repo()
  Add flags to get_commit_notes() to control the format of the note string
  Teach notes code to free its internal data structures on request.
  Teach the notes lookup code to parse notes trees with various fanout schemes
  Selftests verifying semantics when loading notes trees with various fanouts
  Allow flexible organization of notes trees, using both commit date and SHA1
  Add test cases for various date-based fanouts

Johannes Schindelin (5):
  Introduce commit notes
  Add a script to edit/inspect notes
  Speed up git notes lookup
  Add an expensive test for git-notes
  Add '%N'-format for pretty-printing commit notes

 .gitignore                        |    1 +
 Documentation/config.txt          |   13 +
 Documentation/git-fast-import.txt |   45 +++-
 Documentation/git-notes.txt       |   60 ++++
 Documentation/pretty-formats.txt  |    1 +
 Makefile                          |    3 +
 cache.h                           |    4 +
 command-list.txt                  |    1 +
 commit.c                          |    1 +
 config.c                          |    5 +
 environment.c                     |    1 +
 fast-import.c                     |   88 +++++-
 git-notes.sh                      |  121 +++++++
 notes.c                           |  673 +++++++++++++++++++++++++++++++++++++
 notes.h                           |   12 +
 pretty.c                          |   10 +
 t/t3301-notes.sh                  |  150 ++++++++
 t/t3302-notes-index-expensive.sh  |  118 +++++++
 t/t3303-notes-subtrees.sh         |  201 +++++++++++
 t/t9300-fast-import.sh            |  166 +++++++++
 20 files changed, 1664 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100644 notes.c
 create mode 100644 notes.h
 create mode 100755 t/t3301-notes.sh
 create mode 100755 t/t3302-notes-index-expensive.sh
 create mode 100755 t/t3303-notes-subtrees.sh

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCHv6 01/14] Introduce commit notes
  2009-09-12 15:52   ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 " Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 02/14] Add a script to edit/inspect notes Johan Herland
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

Commit notes are blobs which are shown together with the commit
message.  These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.

The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).

The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.

This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
- Alex Riesen: Using char array instead of char pointer costs less BSS

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |   13 +++++++++
 Makefile                 |    2 +
 cache.h                  |    4 +++
 commit.c                 |    1 +
 config.c                 |    5 +++
 environment.c            |    1 +
 notes.c                  |   68 ++++++++++++++++++++++++++++++++++++++++++++++
 notes.h                  |    7 +++++
 pretty.c                 |    5 +++
 9 files changed, 106 insertions(+), 0 deletions(-)
 create mode 100644 notes.c
 create mode 100644 notes.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index cc156b8..32b0cdf 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -458,6 +458,19 @@ On some file system/operating system combinations, this is unreliable.
 Set this config setting to 'rename' there; However, This will remove the
 check that makes sure that existing object files will not get overwritten.
 
+core.notesRef::
+	When showing commit messages, also show notes which are stored in
+	the given ref.  This ref is expected to contain files named
+	after the full SHA-1 of the commit they annotate.
++
+If such a file exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line.  If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be printed.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
 add.ignore-errors::
 	Tells 'git-add' to continue adding files when some files cannot be
 	added due to indexing errors. Equivalent to the '--ignore-errors'
diff --git a/Makefile b/Makefile
index bde2acd..09180ac 100644
--- a/Makefile
+++ b/Makefile
@@ -429,6 +429,7 @@ LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
 LIB_H += merge-recursive.h
+LIB_H += notes.h
 LIB_H += object.h
 LIB_H += pack.h
 LIB_H += pack-refs.h
@@ -513,6 +514,7 @@ LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += name-hash.o
+LIB_OBJS += notes.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-refs.o
diff --git a/cache.h b/cache.h
index 30a7a16..3d1a355 100644
--- a/cache.h
+++ b/cache.h
@@ -372,6 +372,8 @@ static inline enum object_type object_type(unsigned int mode)
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
 
 extern int is_bare_repository_cfg;
 extern int is_bare_repository(void);
@@ -566,6 +568,8 @@ enum object_creation_mode {
 
 extern enum object_creation_mode object_creation_mode;
 
+extern char *notes_ref_name;
+
 extern int grafts_replace_parents;
 
 #define GIT_REPO_VERSION 0
diff --git a/commit.c b/commit.c
index a6c6f70..a0a77a6 100644
--- a/commit.c
+++ b/commit.c
@@ -5,6 +5,7 @@
 #include "utf8.h"
 #include "diff.h"
 #include "revision.h"
+#include "notes.h"
 
 int save_commit_buffer = 1;
 
diff --git a/config.c b/config.c
index f21530c..42bef56 100644
--- a/config.c
+++ b/config.c
@@ -467,6 +467,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.notesref")) {
+		notes_ref_name = xstrdup(value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.pager"))
 		return git_config_string(&pager_program, var, value);
 
diff --git a/environment.c b/environment.c
index 5de6837..571ab56 100644
--- a/environment.c
+++ b/environment.c
@@ -49,6 +49,7 @@ enum push_default_type push_default = PUSH_DEFAULT_MATCHING;
 #define OBJECT_CREATION_MODE OBJECT_CREATION_USES_HARDLINKS
 #endif
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
+char *notes_ref_name;
 int grafts_replace_parents = 1;
 
 /* Parallel index stat data preload? */
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..401966d
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,68 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+#include "strbuf.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding)
+{
+	static const char utf8[] = "utf-8";
+	struct strbuf name = STRBUF_INIT;
+	unsigned char sha1[20];
+	char *msg, *msg_p;
+	unsigned long linelen, msglen;
+	enum object_type type;
+
+	if (!initialized) {
+		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		if (env)
+			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		else if (!notes_ref_name)
+			notes_ref_name = GIT_NOTES_DEFAULT_REF;
+		if (notes_ref_name && read_ref(notes_ref_name, sha1))
+			notes_ref_name = NULL;
+		initialized = 1;
+	}
+
+	if (!notes_ref_name)
+		return;
+
+	strbuf_addf(&name, "%s:%s", notes_ref_name,
+			sha1_to_hex(commit->object.sha1));
+	if (get_sha1(name.buf, sha1))
+		return;
+
+	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+			type != OBJ_BLOB)
+		return;
+
+	if (output_encoding && *output_encoding &&
+			strcmp(utf8, output_encoding)) {
+		char *reencoded = reencode_string(msg, output_encoding, utf8);
+		if (reencoded) {
+			free(msg);
+			msg = reencoded;
+			msglen = strlen(msg);
+		}
+	}
+
+	/* we will end the annotation by a newline anyway */
+	if (msglen && msg[msglen - 1] == '\n')
+		msglen--;
+
+	strbuf_addstr(sb, "\nNotes:\n");
+
+	for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
+		linelen = strchrnul(msg_p, '\n') - msg_p;
+
+		strbuf_addstr(sb, "    ");
+		strbuf_add(sb, msg_p, linelen);
+		strbuf_addch(sb, '\n');
+	}
+
+	free(msg);
+}
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..79d21b6
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,7 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding);
+
+#endif
diff --git a/pretty.c b/pretty.c
index f5983f8..e25db81 100644
--- a/pretty.c
+++ b/pretty.c
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "mailmap.h"
 #include "log-tree.h"
+#include "notes.h"
 #include "color.h"
 
 static char *user_format;
@@ -975,5 +976,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 	 */
 	if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
+
+	if (fmt != CMIT_FMT_ONELINE)
+		get_commit_notes(commit, sb, encoding);
+
 	free(reencoded);
 }
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 02/14] Add a script to edit/inspect notes
  2009-09-12 15:52   ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 " Johan Herland
  2009-09-12 16:08     ` [PATCHv6 01/14] Introduce commit notes Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 03/14] Speed up git notes lookup Johan Herland
                       ` (11 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

The script 'git notes' allows you to edit and show commit notes, by
calling either

	git notes show <commit>

or

	git notes edit <commit>

This patch has been improved by the following contributions:
- Tor Arne Vestbø: fix printing of multi-line notes
- Michael J Gruber: test and handle empty notes gracefully
- Thomas Rast:
  - only clean up message file when editing
  - use GIT_EDITOR and core.editor over VISUAL/EDITOR
  - t3301: fix confusing quoting in test for valid notes ref
  - t3301: use test_must_fail instead of !
  - refuse to edit notes outside refs/notes/
- Junio C Hamano: tests: fix "export var=val"
- Christian Couder: documentation: fix 'linkgit' macro in "git-notes.txt"
- Johan Herland: minor cleanup and bugfixing in git-notes.sh (v2)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                  |    1 +
 Documentation/git-notes.txt |   46 +++++++++++++++++
 Makefile                    |    1 +
 command-list.txt            |    1 +
 git-notes.sh                |   73 +++++++++++++++++++++++++++
 t/t3301-notes.sh            |  114 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 236 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100755 t/t3301-notes.sh

diff --git a/.gitignore b/.gitignore
index 47672b0..c7f9960 100644
--- a/.gitignore
+++ b/.gitignore
@@ -86,6 +86,7 @@ git-mktag
 git-mktree
 git-name-rev
 git-mv
+git-notes
 git-pack-redundant
 git-pack-objects
 git-pack-refs
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..7136016
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,46 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add/inspect commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit]
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, without
+changing the commit.  To discern these notes from the message stored
+in the commit object, the notes are indented like the message, after
+an unindented line saying "Notes:".
+
+To disable commit notes, you have to set the config variable
+core.notesRef to the empty string.  Alternatively, you can set it
+to a different ref, something like "refs/notes/bugzilla".  This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+	Edit the notes for a given commit (defaults to HEAD).
+
+show::
+	Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the linkgit:git[7] suite
diff --git a/Makefile b/Makefile
index 09180ac..6d84be1 100644
--- a/Makefile
+++ b/Makefile
@@ -318,6 +318,7 @@ SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
 SCRIPT_SH += git-mergetool--lib.sh
+SCRIPT_SH += git-notes.sh
 SCRIPT_SH += git-parse-remote.sh
 SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
diff --git a/command-list.txt b/command-list.txt
index fb03a2e..4296941 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -74,6 +74,7 @@ git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
 git-mv                                  mainporcelain common
 git-name-rev                            plumbinginterrogators
+git-notes                               mainporcelain
 git-pack-objects                        plumbingmanipulators
 git-pack-redundant                      plumbinginterrogators
 git-pack-refs                           ancillarymanipulators
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..f06c254
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,73 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$1" && usage
+ACTION="$1"; shift
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
+die "Invalid commit: $@"
+
+case "$ACTION" in
+edit)
+	if [ "${GIT_NOTES_REF#refs/notes/}" = "$GIT_NOTES_REF" ]; then
+		die "Refusing to edit notes in $GIT_NOTES_REF (outside of refs/notes/)"
+	fi
+
+	MSG_FILE="$GIT_DIR/new-notes-$COMMIT"
+	GIT_INDEX_FILE="$MSG_FILE.idx"
+	export GIT_INDEX_FILE
+
+	trap '
+		test -f "$MSG_FILE" && rm "$MSG_FILE"
+		test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
+	' 0
+
+	GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+
+	CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+	if [ -z "$CURRENT_HEAD" ]; then
+		PARENT=
+	else
+		PARENT="-p $CURRENT_HEAD"
+		git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+		git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+	fi
+
+	core_editor="$(git config core.editor)"
+	${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+
+	grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
+	mv "$MSG_FILE".processed "$MSG_FILE"
+	if [ -s "$MSG_FILE" ]; then
+		BLOB=$(git hash-object -w "$MSG_FILE") ||
+			die "Could not write into object database"
+		git update-index --add --cacheinfo 0644 $BLOB $COMMIT ||
+			die "Could not write index"
+	else
+		test -z "$CURRENT_HEAD" &&
+			die "Will not initialise with empty tree"
+		git update-index --force-remove $COMMIT ||
+			die "Could not update index"
+	fi
+
+	TREE=$(git write-tree) || die "Could not write tree"
+	NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+		die "Could not annotate"
+	git update-ref -m "Annotate $COMMIT" \
+		"$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+	git rev-parse -q --verify "$GIT_NOTES_REF":$COMMIT > /dev/null ||
+		die "No note for commit $COMMIT."
+	git show "$GIT_NOTES_REF":$COMMIT
+;;
+*)
+	usage
+esac
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..73e53be
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,114 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+	(MSG=3 && export MSG && test_must_fail git notes edit)
+'
+
+test_expect_success setup '
+	: > a1 &&
+	git add a1 &&
+	test_tick &&
+	git commit -m 1st &&
+	: > a2 &&
+	git add a2 &&
+	test_tick &&
+	git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+	(MSG=1 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes edit) &&
+	(MSG=2 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes show)
+'
+
+test_expect_success 'refusing to edit in refs/heads/' '
+	(MSG=1 GIT_NOTES_REF=refs/heads/bogus &&
+	 export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes edit)
+'
+
+test_expect_success 'refusing to edit in refs/remotes/' '
+	(MSG=1 GIT_NOTES_REF=refs/remotes/bogus &&
+	 export MSG GIT_NOTES_REF &&
+	 test_must_fail git notes edit)
+'
+
+# 1 indicates caught gracefully by die, 128 means git-show barked
+test_expect_success 'handle empty notes gracefully' '
+	git notes show ; test 1 = $?
+'
+
+test_expect_success 'create notes' '
+	git config core.notesRef refs/notes/commits &&
+	MSG=b1 git notes edit &&
+	test ! -f .git/new-notes &&
+	test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+	test b1 = $(git notes show) &&
+	git show HEAD^ &&
+	test_must_fail git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+    2nd
+
+Notes:
+    b1
+EOF
+
+test_expect_success 'show notes' '
+	! (git cat-file commit HEAD | grep b1) &&
+	git log -1 > output &&
+	test_cmp expect output
+'
+test_expect_success 'create multi-line notes (setup)' '
+	: > a3 &&
+	git add a3 &&
+	test_tick &&
+	git commit -m 3rd &&
+	MSG="b3
+c3c3c3c3
+d3d3d3" git notes edit
+'
+
+cat > expect-multiline << EOF
+commit 1584215f1d29c65e99c6c6848626553fdd07fd75
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:15:13 2005 -0700
+
+    3rd
+
+Notes:
+    b3
+    c3c3c3c3
+    d3d3d3
+EOF
+
+printf "\n" >> expect-multiline
+cat expect >> expect-multiline
+
+test_expect_success 'show multi-line notes' '
+	git log -2 > output &&
+	test_cmp expect-multiline output
+'
+
+test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 03/14] Speed up git notes lookup
  2009-09-12 15:52   ` Johan Herland
                       ` (2 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 02/14] Add a script to edit/inspect notes Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 04/14] Add an expensive test for git-notes Johan Herland
                       ` (10 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.

The idea for the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.

Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object.  This
allows you to annotate arbitrary branches, or objects.

This patch has been improved by the following contributions:
- Junio C Hamano: fixed an obvious error in initialize_hash_map()

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 notes.c |  112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 102 insertions(+), 10 deletions(-)

diff --git a/notes.c b/notes.c
index 401966d..9172154 100644
--- a/notes.c
+++ b/notes.c
@@ -4,15 +4,112 @@
 #include "refs.h"
 #include "utf8.h"
 #include "strbuf.h"
+#include "tree-walk.h"
+
+struct entry {
+	unsigned char commit_sha1[20];
+	unsigned char notes_sha1[20];
+};
+
+struct hash_map {
+	struct entry *entries;
+	off_t count, size;
+};
 
 static int initialized;
+static struct hash_map hash_map;
+
+static int hash_index(struct hash_map *map, const unsigned char *sha1)
+{
+	int i = ((*(unsigned int *)sha1) % map->size);
+
+	for (;;) {
+		unsigned char *current = map->entries[i].commit_sha1;
+
+		if (!hashcmp(sha1, current))
+			return i;
+
+		if (is_null_sha1(current))
+			return -1 - i;
+
+		if (++i == map->size)
+			i = 0;
+	}
+}
+
+static void add_entry(const unsigned char *commit_sha1,
+		const unsigned char *notes_sha1)
+{
+	int index;
+
+	if (hash_map.count + 1 > hash_map.size >> 1) {
+		int i, old_size = hash_map.size;
+		struct entry *old = hash_map.entries;
+
+		hash_map.size = old_size ? old_size << 1 : 64;
+		hash_map.entries = (struct entry *)
+			xcalloc(sizeof(struct entry), hash_map.size);
+
+		for (i = 0; i < old_size; i++)
+			if (!is_null_sha1(old[i].commit_sha1)) {
+				index = -1 - hash_index(&hash_map,
+						old[i].commit_sha1);
+				memcpy(hash_map.entries + index, old + i,
+					sizeof(struct entry));
+			}
+		free(old);
+	}
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0) {
+		index = -1 - index;
+		hash_map.count++;
+	}
+
+	hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
+	hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+}
+
+static void initialize_hash_map(const char *notes_ref_name)
+{
+	unsigned char sha1[20], commit_sha1[20];
+	unsigned mode;
+	struct tree_desc desc;
+	struct name_entry entry;
+	void *buf;
+
+	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
+	    get_tree_entry(commit_sha1, "", sha1, &mode))
+		return;
+
+	buf = fill_tree_descriptor(&desc, sha1);
+	if (!buf)
+		die("Could not read %s for notes-index", sha1_to_hex(sha1));
+
+	while (tree_entry(&desc, &entry))
+		if (!get_sha1(entry.path, commit_sha1))
+			add_entry(commit_sha1, entry.sha1);
+	free(buf);
+}
+
+static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+{
+	int index;
+
+	if (!hash_map.size)
+		return NULL;
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0)
+		return NULL;
+	return hash_map.entries[index].notes_sha1;
+}
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding)
 {
 	static const char utf8[] = "utf-8";
-	struct strbuf name = STRBUF_INIT;
-	unsigned char sha1[20];
+	unsigned char *sha1;
 	char *msg, *msg_p;
 	unsigned long linelen, msglen;
 	enum object_type type;
@@ -23,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
 		else if (!notes_ref_name)
 			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		if (notes_ref_name && read_ref(notes_ref_name, sha1))
-			notes_ref_name = NULL;
+		initialize_hash_map(notes_ref_name);
 		initialized = 1;
 	}
 
-	if (!notes_ref_name)
-		return;
-
-	strbuf_addf(&name, "%s:%s", notes_ref_name,
-			sha1_to_hex(commit->object.sha1));
-	if (get_sha1(name.buf, sha1))
+	sha1 = lookup_notes(commit->object.sha1);
+	if (!sha1)
 		return;
 
 	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 04/14] Add an expensive test for git-notes
  2009-09-12 15:52   ` Johan Herland
                       ` (3 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 03/14] Speed up git notes lookup Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

git-notes have the potential of being pretty expensive, so test with
a lot of commits.  A lot.  So to make things cheaper, you have to
opt-in explicitely, by setting the environment variable
GIT_NOTES_TIMING_TESTS.

This patch has been improved by the following contributions:
- Junio C Hamano: tests: fix "export var=val"

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t3302-notes-index-expensive.sh |   98 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)
 create mode 100755 t/t3302-notes-index-expensive.sh

diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..0ef3e95
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+	say Skipping timing tests
+	test_done
+	exit
+}
+
+create_repo () {
+	number_of_commits=$1
+	nr=0
+	parent=
+	test -d .git || {
+	git init &&
+	tree=$(git write-tree) &&
+	while [ $nr -lt $number_of_commits ]; do
+		test_tick &&
+		commit=$(echo $nr | git commit-tree $tree $parent) ||
+			return
+		parent="-p $commit"
+		nr=$(($nr+1))
+	done &&
+	git update-ref refs/heads/master $commit &&
+	{
+		GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
+		git rev-list HEAD | cat -n | sed "s/^[ 	][ 	]*/ /g" |
+		while read nr sha1; do
+			blob=$(echo note $nr | git hash-object -w --stdin) &&
+			echo $sha1 | sed "s/^/0644 $blob 0	/"
+		done | git update-index --index-info &&
+		tree=$(git write-tree) &&
+		test_tick &&
+		commit=$(echo notes | git commit-tree $tree) &&
+		git update-ref refs/notes/commits $commit
+	} &&
+	git config core.notesRef refs/notes/commits
+	}
+}
+
+test_notes () {
+	count=$1 &&
+	git config core.notesRef refs/notes/commits &&
+	git log | grep "^    " > output &&
+	i=1 &&
+	while [ $i -le $count ]; do
+		echo "    $(($count-$i))" &&
+		echo "    note $i" &&
+		i=$(($i+1));
+	done > expect &&
+	git diff expect output
+}
+
+cat > time_notes << \EOF
+	mode=$1
+	i=1
+	while [ $i -lt $2 ]; do
+		case $1 in
+		no-notes)
+			GIT_NOTES_REF=non-existing; export GIT_NOTES_REF
+		;;
+		notes)
+			unset GIT_NOTES_REF
+		;;
+		esac
+		git log >/dev/null
+		i=$(($i+1))
+	done
+EOF
+
+time_notes () {
+	for mode in no-notes notes
+	do
+		echo $mode
+		/usr/bin/time sh ../time_notes $mode $1
+	done
+}
+
+for count in 10 100 1000 10000; do
+
+	mkdir $count
+	(cd $count;
+
+	test_expect_success "setup $count" "create_repo $count"
+
+	test_expect_success 'notes work' "test_notes $count"
+
+	test_expect_success 'notes timing' "time_notes 100"
+	)
+done
+
+test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit"
  2009-09-12 15:52   ` Johan Herland
                       ` (4 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 04/14] Add an expensive test for git-notes Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 06/14] fast-import: Add support for importing commit notes Johan Herland
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

The "-m" and "-F" options are already the established method
(in both git-commit and git-tag) to specify a commit/tag message
without invoking the editor. This patch teaches "git notes edit"
to respect the same options for specifying a notes message without
invoking the editor.

Multiple "-m" and/or "-F" options are concatenated as separate
paragraphs.

The patch also updates the "git notes" documentation and adds
selftests for the new functionality. Unfortunately, the added
selftests include a couple of lines with trailing whitespace
(without these the test will fail). This may cause git to warn
about "whitespace errors".

This patch has been improved by the following contributions:
- Thomas Rast: fix trailing whitespace in t3301

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-notes.txt |   16 ++++++++++-
 git-notes.sh                |   64 +++++++++++++++++++++++++++++++++++++-----
 t/t3301-notes.sh            |   36 ++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
index 7136016..94cceb1 100644
--- a/Documentation/git-notes.txt
+++ b/Documentation/git-notes.txt
@@ -8,7 +8,7 @@ git-notes - Add/inspect commit notes
 SYNOPSIS
 --------
 [verse]
-'git-notes' (edit | show) [commit]
+'git-notes' (edit [-F <file> | -m <msg>] | show) [commit]
 
 DESCRIPTION
 -----------
@@ -33,6 +33,20 @@ show::
 	Show the notes for a given commit (defaults to HEAD).
 
 
+OPTIONS
+-------
+-m <msg>::
+	Use the given note message (instead of prompting).
+	If multiple `-m` (or `-F`) options are given, their
+	values are concatenated as separate paragraphs.
+
+-F <file>::
+	Take the note message from the given file.  Use '-' to
+	read the note message from the standard input.
+	If multiple `-F` (or `-m`) options are given, their
+	values are concatenated as separate paragraphs.
+
+
 Author
 ------
 Written by Johannes Schindelin <johannes.schindelin@gmx.de>
diff --git a/git-notes.sh b/git-notes.sh
index f06c254..e642e47 100755
--- a/git-notes.sh
+++ b/git-notes.sh
@@ -1,16 +1,59 @@
 #!/bin/sh
 
-USAGE="(edit | show) [commit]"
+USAGE="(edit [-F <file> | -m <msg>] | show) [commit]"
 . git-sh-setup
 
-test -n "$3" && usage
-
 test -z "$1" && usage
 ACTION="$1"; shift
 
 test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
 test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
 
+MESSAGE=
+while test $# != 0
+do
+	case "$1" in
+	-m)
+		test "$ACTION" = "edit" || usage
+		shift
+		if test "$#" = "0"; then
+			die "error: option -m needs an argument"
+		else
+			if [ -z "$MESSAGE" ]; then
+				MESSAGE="$1"
+			else
+				MESSAGE="$MESSAGE
+
+$1"
+			fi
+			shift
+		fi
+		;;
+	-F)
+		test "$ACTION" = "edit" || usage
+		shift
+		if test "$#" = "0"; then
+			die "error: option -F needs an argument"
+		else
+			if [ -z "$MESSAGE" ]; then
+				MESSAGE="$(cat "$1")"
+			else
+				MESSAGE="$MESSAGE
+
+$(cat "$1")"
+			fi
+			shift
+		fi
+		;;
+	-*)
+		usage
+		;;
+	*)
+		break
+		;;
+	esac
+done
+
 COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
 die "Invalid commit: $@"
 
@@ -29,19 +72,24 @@ edit)
 		test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
 	' 0
 
-	GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
-
 	CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
 	if [ -z "$CURRENT_HEAD" ]; then
 		PARENT=
 	else
 		PARENT="-p $CURRENT_HEAD"
 		git read-tree "$GIT_NOTES_REF" || die "Could not read index"
-		git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
 	fi
 
-	core_editor="$(git config core.editor)"
-	${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+	if [ -z "$MESSAGE" ]; then
+		GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+		if [ ! -z "$CURRENT_HEAD" ]; then
+			git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+		fi
+		core_editor="$(git config core.editor)"
+		${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+	else
+		echo "$MESSAGE" > "$MSG_FILE"
+	fi
 
 	grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
 	mv "$MSG_FILE".processed "$MSG_FILE"
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index 73e53be..1e34f48 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh
@@ -110,5 +110,41 @@ test_expect_success 'show multi-line notes' '
 	git log -2 > output &&
 	test_cmp expect-multiline output
 '
+test_expect_success 'create -m and -F notes (setup)' '
+	: > a4 &&
+	git add a4 &&
+	test_tick &&
+	git commit -m 4th &&
+	echo "xyzzy" > note5 &&
+	git notes edit -m spam -F note5 -m "foo
+bar
+baz"
+'
+
+whitespace="    "
+cat > expect-m-and-F << EOF
+commit 15023535574ded8b1a89052b32673f84cf9582b8
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:16:13 2005 -0700
+
+    4th
+
+Notes:
+    spam
+$whitespace
+    xyzzy
+$whitespace
+    foo
+    bar
+    baz
+EOF
+
+printf "\n" >> expect-m-and-F
+cat expect-multiline >> expect-m-and-F
+
+test_expect_success 'show -m and -F notes' '
+	git log -3 > output &&
+	test_cmp expect-m-and-F output
+'
 
 test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 06/14] fast-import: Add support for importing commit notes
  2009-09-12 15:52   ` Johan Herland
                       ` (5 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
                       ` (7 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
is similar to 'filemodify', except that no mode is supplied (all notes have
mode 0644), and the path is set to the hex SHA1 of the given "comittish".

This enables fast import of note objects along with their associated commits,
since the notes can now be named using the mark references of their
corresponding commits.

The patch also includes a test case of the added functionality.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/git-fast-import.txt |   45 +++++++++--
 fast-import.c                     |   88 +++++++++++++++++++-
 t/t9300-fast-import.sh            |  166 +++++++++++++++++++++++++++++++++++++
 3 files changed, 289 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f1c94b4..bb198c2 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -325,7 +325,7 @@ change to the project.
 	data
 	('from' SP <committish> LF)?
 	('merge' SP <committish> LF)?
-	(filemodify | filedelete | filecopy | filerename | filedeleteall)*
+	(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
 	LF?
 ....
 
@@ -348,14 +348,13 @@ commit message use a 0 length data.  Commit messages are free-form
 and are not interpreted by Git.  Currently they must be encoded in
 UTF-8, as fast-import does not permit other encodings to be specified.
 
-Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`
-and `filedeleteall` commands
+Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
+`filedeleteall` and `notemodify` commands
 may be included to update the contents of the branch prior to
 creating the commit.  These commands may be supplied in any order.
 However it is recommended that a `filedeleteall` command precede
-all `filemodify`, `filecopy` and `filerename` commands in the same
-commit, as `filedeleteall`
-wipes the branch clean (see below).
+all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
+the same commit, as `filedeleteall` wipes the branch clean (see below).
 
 The `LF` after the command is optional (it used to be required).
 
@@ -604,6 +603,40 @@ more memory per active branch (less than 1 MiB for even most large
 projects); so frontends that can easily obtain only the affected
 paths for a commit are encouraged to do so.
 
+`notemodify`
+^^^^^^^^^^^^
+Included in a `commit` command to add a new note (annotating a given
+commit) or change the content of an existing note.  This command has
+two different means of specifying the content of the note.
+
+External data format::
+	The data content for the note was already supplied by a prior
+	`blob` command.  The frontend just needs to connect it to the
+	commit that is to be annotated.
++
+....
+	'N' SP <dataref> SP <committish> LF
+....
++
+Here `<dataref>` can be either a mark reference (`:<idnum>`)
+set by a prior `blob` command, or a full 40-byte SHA-1 of an
+existing Git blob object.
+
+Inline data format::
+	The data content for the note has not been supplied yet.
+	The frontend wants to supply it as part of this modify
+	command.
++
+....
+	'N' SP 'inline' SP <committish> LF
+	data
+....
++
+See below for a detailed description of the `data` command.
+
+In both formats `<committish>` is any of the commit specification
+expressions also accepted by `from` (see above).
+
 `mark`
 ~~~~~~
 Arranges for fast-import to save a reference to the current object, allowing
diff --git a/fast-import.c b/fast-import.c
index dcfb8fa..1e91358 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -22,8 +22,8 @@ Format of STDIN stream:
     ('author' sp name sp '<' email '>' sp when lf)?
     'committer' sp name sp '<' email '>' sp when lf
     commit_msg
-    ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
-    ('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)*
+    ('from' sp committish lf)?
+    ('merge' sp committish lf)*
     file_change*
     lf?;
   commit_msg ::= data;
@@ -41,15 +41,18 @@ Format of STDIN stream:
   file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf;
   file_inm ::= 'M' sp mode sp 'inline' sp path_str lf
     data;
+  note_obm ::= 'N' sp (hexsha1 | idnum) sp committish lf;
+  note_inm ::= 'N' sp 'inline' sp committish lf
+    data;
 
   new_tag ::= 'tag' sp tag_str lf
-    'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf
+    'from' sp committish lf
     ('tagger' sp name sp '<' email '>' sp when lf)?
     tag_msg;
   tag_msg ::= data;
 
   reset_branch ::= 'reset' sp ref_str lf
-    ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
+    ('from' sp committish lf)?
     lf?;
 
   checkpoint ::= 'checkpoint' lf
@@ -88,6 +91,7 @@ Format of STDIN stream:
      # stream formatting is: \, " and LF.  Otherwise these values
      # are UTF8.
      #
+  committish  ::= (ref_str | hexsha1 | sha1exp_str | idnum);
   ref_str     ::= ref;
   sha1exp_str ::= sha1exp;
   tag_str     ::= tag;
@@ -2053,6 +2057,80 @@ static void file_change_cr(struct branch *b, int rename)
 		leaf.tree);
 }
 
+static void note_change_n(struct branch *b)
+{
+	const char *p = command_buf.buf + 2;
+	static struct strbuf uq = STRBUF_INIT;
+	struct object_entry *oe = oe;
+	struct branch *s;
+	unsigned char sha1[20], commit_sha1[20];
+	uint16_t inline_data = 0;
+
+	/* <dataref> or 'inline' */
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		hashcpy(sha1, oe->sha1);
+		p = x;
+	} else if (!prefixcmp(p, "inline")) {
+		inline_data = 1;
+		p += 6;
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		oe = find_object(sha1);
+		p += 40;
+	}
+	if (*p++ != ' ')
+		die("Missing space after SHA1: %s", command_buf.buf);
+
+	/* <committish> */
+	s = lookup_branch(p);
+	if (s) {
+		hashcpy(commit_sha1, s->sha1);
+	} else if (*p == ':') {
+		uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+		struct object_entry *commit_oe = find_mark(commit_mark);
+		if (commit_oe->type != OBJ_COMMIT)
+			die("Mark :%" PRIuMAX " not a commit", commit_mark);
+		hashcpy(commit_sha1, commit_oe->sha1);
+	} else if (!get_sha1(p, commit_sha1)) {
+		unsigned long size;
+		char *buf = read_object_with_reference(commit_sha1,
+			commit_type, &size, commit_sha1);
+		if (!buf || size < 46)
+			die("Not a valid commit: %s", p);
+		free(buf);
+	} else
+		die("Invalid ref name or SHA1 expression: %s", p);
+
+	if (inline_data) {
+		static struct strbuf buf = STRBUF_INIT;
+
+		if (p != uq.buf) {
+			strbuf_addstr(&uq, p);
+			p = uq.buf;
+		}
+		read_next_command();
+		parse_data(&buf);
+		store_object(OBJ_BLOB, &buf, &last_blob, sha1, 0);
+	} else if (oe) {
+		if (oe->type != OBJ_BLOB)
+			die("Not a blob (actually a %s): %s",
+				typename(oe->type), command_buf.buf);
+	} else {
+		enum object_type type = sha1_object_info(sha1, NULL);
+		if (type < 0)
+			die("Blob not found: %s", command_buf.buf);
+		if (type != OBJ_BLOB)
+			die("Not a blob (actually a %s): %s",
+			    typename(type), command_buf.buf);
+	}
+
+	tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
+		S_IFREG | 0644, NULL);
+}
+
 static void file_change_deleteall(struct branch *b)
 {
 	release_tree_content_recursive(b->branch_tree.tree);
@@ -2222,6 +2300,8 @@ static void parse_new_commit(void)
 			file_change_cr(b, 1);
 		else if (!prefixcmp(command_buf.buf, "C "))
 			file_change_cr(b, 0);
+		else if (!prefixcmp(command_buf.buf, "N "))
+			note_change_n(b);
 		else if (!strcmp("deleteall", command_buf.buf))
 			file_change_deleteall(b);
 		else {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index d33fc55..2f5c323 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1089,6 +1089,172 @@ test_expect_success 'P: fail on blob mark in gitlink' '
     test_must_fail git fast-import <input'
 
 ###
+### series Q (notes)
+###
+
+note1_data="Note for the first commit"
+note2_data="Note for the second commit"
+note3_data="Note for the third commit"
+
+test_tick
+cat >input <<INPUT_END
+blob
+mark :2
+data <<EOF
+$file2_data
+EOF
+
+commit refs/heads/notes-test
+mark :3
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+first (:3)
+COMMIT
+
+M 644 :2 file2
+
+blob
+mark :4
+data $file4_len
+$file4_data
+commit refs/heads/notes-test
+mark :5
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+second (:5)
+COMMIT
+
+M 644 :4 file4
+
+commit refs/heads/notes-test
+mark :6
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+third (:6)
+COMMIT
+
+M 644 inline file5
+data <<EOF
+$file5_data
+EOF
+
+M 755 inline file6
+data <<EOF
+$file6_data
+EOF
+
+blob
+mark :7
+data <<EOF
+$note1_data
+EOF
+
+blob
+mark :8
+data <<EOF
+$note2_data
+EOF
+
+commit refs/notes/foobar
+mark :9
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes (:9)
+COMMIT
+
+N :7 :3
+N :8 :5
+N inline :6
+data <<EOF
+$note3_data
+EOF
+
+INPUT_END
+test_expect_success \
+	'Q: commit notes' \
+	'git fast-import <input &&
+	 git whatchanged notes-test'
+test_expect_success \
+	'Q: verify pack' \
+	'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
+
+commit1=$(git rev-parse notes-test~2)
+commit2=$(git rev-parse notes-test^)
+commit3=$(git rev-parse notes-test)
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+first (:3)
+EOF
+test_expect_success \
+	'Q: verify first commit' \
+	'git cat-file commit notes-test~2 | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit1
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+second (:5)
+EOF
+test_expect_success \
+	'Q: verify second commit' \
+	'git cat-file commit notes-test^ | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit2
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+third (:6)
+EOF
+test_expect_success \
+	'Q: verify third commit' \
+	'git cat-file commit notes-test | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+notes (:9)
+EOF
+test_expect_success \
+	'Q: verify notes commit' \
+	'git cat-file commit refs/notes/foobar | sed 1d >actual &&
+	test_cmp expect actual'
+
+cat >expect.unsorted <<EOF
+100644 blob $commit1
+100644 blob $commit2
+100644 blob $commit3
+EOF
+cat expect.unsorted | sort >expect
+test_expect_success \
+	'Q: verify notes tree' \
+	'git cat-file -p refs/notes/foobar^{tree} | sed "s/ [0-9a-f]*	/ /" >actual &&
+	 test_cmp expect actual'
+
+echo "$note1_data" >expect
+test_expect_success \
+	'Q: verify note for first commit' \
+	'git cat-file blob refs/notes/foobar:$commit1 >actual && test_cmp expect actual'
+
+echo "$note2_data" >expect
+test_expect_success \
+	'Q: verify note for second commit' \
+	'git cat-file blob refs/notes/foobar:$commit2 >actual && test_cmp expect actual'
+
+echo "$note3_data" >expect
+test_expect_success \
+	'Q: verify note for third commit' \
+	'git cat-file blob refs/notes/foobar:$commit3 >actual && test_cmp expect actual'
+
+###
 ### series R (feature and option)
 ###
 
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo()
  2009-09-12 15:52   ` Johan Herland
                       ` (6 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 06/14] fast-import: Add support for importing commit notes Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Creating repos with 10/100/1000/10000 commits and notes takes a lot of time.
However, using git-fast-import to do the job is a lot more efficient than
using plumbing commands to do the same.

This patch decreases the overall run-time of this test on my machine from
~3 to ~1 minutes.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t3302-notes-index-expensive.sh |   74 ++++++++++++++++++++++++--------------
 1 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
index 0ef3e95..ee84fc4 100755
--- a/t/t3302-notes-index-expensive.sh
+++ b/t/t3302-notes-index-expensive.sh
@@ -16,30 +16,50 @@ test -z "$GIT_NOTES_TIMING_TESTS" && {
 create_repo () {
 	number_of_commits=$1
 	nr=0
-	parent=
 	test -d .git || {
 	git init &&
-	tree=$(git write-tree) &&
-	while [ $nr -lt $number_of_commits ]; do
-		test_tick &&
-		commit=$(echo $nr | git commit-tree $tree $parent) ||
-			return
-		parent="-p $commit"
-		nr=$(($nr+1))
-	done &&
-	git update-ref refs/heads/master $commit &&
-	{
-		GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
-		git rev-list HEAD | cat -n | sed "s/^[ 	][ 	]*/ /g" |
-		while read nr sha1; do
-			blob=$(echo note $nr | git hash-object -w --stdin) &&
-			echo $sha1 | sed "s/^/0644 $blob 0	/"
-		done | git update-index --index-info &&
-		tree=$(git write-tree) &&
+	(
+		while [ $nr -lt $number_of_commits ]; do
+			nr=$(($nr+1))
+			mark=$(($nr+$nr))
+			notemark=$(($mark+1))
+			test_tick &&
+			cat <<INPUT_END &&
+commit refs/heads/master
+mark :$mark
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+blob
+mark :$notemark
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			echo "N :$notemark :$mark" >> note_commit
+		done &&
 		test_tick &&
-		commit=$(echo notes | git commit-tree $tree) &&
-		git update-ref refs/notes/commits $commit
-	} &&
+		cat <<INPUT_END &&
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+INPUT_END
+
+		cat note_commit
+	) |
+	git fast-import --quiet &&
 	git config core.notesRef refs/notes/commits
 	}
 }
@@ -48,13 +68,13 @@ test_notes () {
 	count=$1 &&
 	git config core.notesRef refs/notes/commits &&
 	git log | grep "^    " > output &&
-	i=1 &&
-	while [ $i -le $count ]; do
-		echo "    $(($count-$i))" &&
-		echo "    note $i" &&
-		i=$(($i+1));
+	i=$count &&
+	while [ $i -gt 0 ]; do
+		echo "    commit #$i" &&
+		echo "    note for commit #$i" &&
+		i=$(($i-1));
 	done > expect &&
-	git diff expect output
+	test_cmp expect output
 }
 
 cat > time_notes << \EOF
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string
  2009-09-12 15:52   ` Johan Herland
                       ` (7 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
                       ` (5 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

This patch adds the following flags to get_commit_notes() for adjusting the
format of the produced note string:
- NOTES_SHOW_HEADER: Print "Notes:" line before the notes contents
- NOTES_INDENT: Indent notes contents by 4 spaces

Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c  |    8 +++++---
 notes.h  |    5 ++++-
 pretty.c |    3 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/notes.c b/notes.c
index 9172154..84c30c1 100644
--- a/notes.c
+++ b/notes.c
@@ -106,7 +106,7 @@ static unsigned char *lookup_notes(const unsigned char *commit_sha1)
 }
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
-		const char *output_encoding)
+		const char *output_encoding, int flags)
 {
 	static const char utf8[] = "utf-8";
 	unsigned char *sha1;
@@ -146,12 +146,14 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 	if (msglen && msg[msglen - 1] == '\n')
 		msglen--;
 
-	strbuf_addstr(sb, "\nNotes:\n");
+	if (flags & NOTES_SHOW_HEADER)
+		strbuf_addstr(sb, "\nNotes:\n");
 
 	for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
 		linelen = strchrnul(msg_p, '\n') - msg_p;
 
-		strbuf_addstr(sb, "    ");
+		if (flags & NOTES_INDENT)
+			strbuf_addstr(sb, "    ");
 		strbuf_add(sb, msg_p, linelen);
 		strbuf_addch(sb, '\n');
 	}
diff --git a/notes.h b/notes.h
index 79d21b6..7f3eed4 100644
--- a/notes.h
+++ b/notes.h
@@ -1,7 +1,10 @@
 #ifndef NOTES_H
 #define NOTES_H
 
+#define NOTES_SHOW_HEADER 1
+#define NOTES_INDENT 2
+
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
-		const char *output_encoding);
+		const char *output_encoding, int flags);
 
 #endif
diff --git a/pretty.c b/pretty.c
index e25db81..01eadd0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -978,7 +978,8 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 		strbuf_addch(sb, '\n');
 
 	if (fmt != CMIT_FMT_ONELINE)
-		get_commit_notes(commit, sb, encoding);
+		get_commit_notes(commit, sb, encoding,
+				 NOTES_SHOW_HEADER | NOTES_INDENT);
 
 	free(reencoded);
 }
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes
  2009-09-12 15:52   ` Johan Herland
                       ` (8 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/pretty-formats.txt |    1 +
 pretty.c                         |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 2a845b1..5fb10b3 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -123,6 +123,7 @@ The placeholders are:
 - '%s': subject
 - '%f': sanitized subject line, suitable for a filename
 - '%b': body
+- '%N': commit notes
 - '%Cred': switch color to red
 - '%Cgreen': switch color to green
 - '%Cblue': switch color to blue
diff --git a/pretty.c b/pretty.c
index 01eadd0..7f350bb 100644
--- a/pretty.c
+++ b/pretty.c
@@ -702,6 +702,10 @@ static size_t format_commit_item(struct strbuf *sb, const char *placeholder,
 	case 'd':
 		format_decoration(sb, commit);
 		return 1;
+	case 'N':
+		get_commit_notes(commit, sb, git_log_output_encoding ?
+			     git_log_output_encoding : git_commit_encoding, 0);
+		return 1;
 	}
 
 	/* For the rest we have to parse the commit header. */
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 10/14] Teach notes code to free its internal data structures on request.
  2009-09-12 15:52   ` Johan Herland
                       ` (9 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 18:40       ` Junio C Hamano
  2009-09-12 16:08     ` [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
                       ` (3 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

There's no need to be rude to memory-concious callers...

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |    7 +++++++
 notes.h |    2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/notes.c b/notes.c
index 84c30c1..008c3d4 100644
--- a/notes.c
+++ b/notes.c
@@ -160,3 +160,10 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 
 	free(msg);
 }
+
+void free_commit_notes()
+{
+	free(hash_map.entries);
+	memset(&hash_map, 0, sizeof(struct hash_map));
+	initialized = 0;
+}
diff --git a/notes.h b/notes.h
index 7f3eed4..41802e5 100644
--- a/notes.h
+++ b/notes.h
@@ -7,4 +7,6 @@
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding, int flags);
 
+void free_commit_notes();
+
 #endif
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes
  2009-09-12 15:52   ` Johan Herland
                       ` (10 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
                       ` (2 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce, Johannes Schindelin

The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
  objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
  no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
  only the subtree with the longest matching prefix is consulted. This
  means that if the given commit is e.g. "deadbeef", and the notes tree have
  subtrees "de" and "dead", then the following paths in the notes tree are
  searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
  from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
  acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
  recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.

This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.

The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).

In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).

As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.

However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.

Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  317 +++++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 248 insertions(+), 69 deletions(-)

diff --git a/notes.c b/notes.c
index 008c3d4..6926aa6 100644
--- a/notes.c
+++ b/notes.c
@@ -6,103 +6,282 @@
 #include "strbuf.h"
 #include "tree-walk.h"
 
-struct entry {
-	unsigned char commit_sha1[20];
-	unsigned char notes_sha1[20];
+/*
+ * Use a non-balancing simple 16-tree structure with struct int_node as
+ * internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
+ * 16-array of pointers to its children.
+ * The bottom 2 bits of each pointer is used to identify the pointer type
+ * - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
+ * - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
+ * - ptr & 3 == 2 - pointer to note entry - cast to struct leaf_node *
+ * - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
+ *
+ * The root node is a statically allocated struct int_node.
+ */
+struct int_node {
+	void *a[16];
 };
 
-struct hash_map {
-	struct entry *entries;
-	off_t count, size;
+/*
+ * Leaf nodes come in two variants, note entries and subtree entries,
+ * distinguished by the LSb of the leaf node pointer (see above).
+ * As a note entry, the key is the SHA1 of the referenced commit, and the
+ * value is the SHA1 of the note object.
+ * As a subtree entry, the key is the prefix SHA1 (w/trailing NULs) of the
+ * referenced commit, using the last byte of the key to store the length of
+ * the prefix. The value is the SHA1 of the tree object containing the notes
+ * subtree.
+ */
+struct leaf_node {
+	unsigned char key_sha1[20];
+	unsigned char val_sha1[20];
 };
 
-static int initialized;
-static struct hash_map hash_map;
+#define PTR_TYPE_NULL     0
+#define PTR_TYPE_INTERNAL 1
+#define PTR_TYPE_NOTE     2
+#define PTR_TYPE_SUBTREE  3
 
-static int hash_index(struct hash_map *map, const unsigned char *sha1)
-{
-	int i = ((*(unsigned int *)sha1) % map->size);
+#define GET_PTR_TYPE(ptr)       ((uintptr_t) (ptr) & 3)
+#define CLR_PTR_TYPE(ptr)       ((void *) ((uintptr_t) (ptr) & ~3))
+#define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))
 
-	for (;;) {
-		unsigned char *current = map->entries[i].commit_sha1;
+#define GET_NIBBLE(n, sha1) (((sha1[n >> 1]) >> ((n & 0x01) << 2)) & 0x0f)
 
-		if (!hashcmp(sha1, current))
-			return i;
+#define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
+	(memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
 
-		if (is_null_sha1(current))
-			return -1 - i;
+static struct int_node root_node;
 
-		if (++i == map->size)
-			i = 0;
+static int initialized;
+
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+		unsigned int n);
+
+/*
+ * To find a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth nibble of the key as an index into a:
+ *    - If a[n] is an int_node, recurse into that node and increment n
+ *    - If a leaf_node with matching key, return leaf_node (assert note entry)
+ *    - If a matching subtree entry, unpack that subtree entry (and remove it);
+ *      restart search at the current level.
+ *    - Otherwise, we end up at a NULL pointer, or a non-matching leaf_node.
+ *      Backtrack out of the recursion, one level at a time and check a[0]:
+ *      - If a[0] at the current level is a matching subtree entry, unpack that
+ *        subtree entry (and remove it); restart search at the current level.
+ */
+static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
+		const unsigned char *key_sha1)
+{
+	struct leaf_node *l;
+	unsigned char i = GET_NIBBLE(n, key_sha1);
+	void *p = tree->a[i];
+
+	switch(GET_PTR_TYPE(p)) {
+	case PTR_TYPE_INTERNAL:
+		l = note_tree_find(CLR_PTR_TYPE(p), n + 1, key_sha1);
+		if (l)
+			return l;
+		break;
+	case PTR_TYPE_NOTE:
+		l = (struct leaf_node *) CLR_PTR_TYPE(p);
+		if (!hashcmp(key_sha1, l->key_sha1))
+			return l; /* return note object matching given key */
+		break;
+	case PTR_TYPE_SUBTREE:
+		l = (struct leaf_node *) CLR_PTR_TYPE(p);
+		if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
+			/* unpack tree and resume search */
+			tree->a[i] = NULL;
+			load_subtree(l, tree, n);
+			free(l);
+			return note_tree_find(tree, n, key_sha1);
+		}
+		break;
+	case PTR_TYPE_NULL:
+	default:
+		assert(!p);
+		break;
 	}
+
+	/*
+	 * Did not find key at this (or any lower) level.
+	 * Check if there's a matching subtree entry in tree->a[0].
+	 * If so, unpack tree and resume search.
+	 */
+	p = tree->a[0];
+	if (GET_PTR_TYPE(p) != PTR_TYPE_SUBTREE)
+		return NULL;
+	l = (struct leaf_node *) CLR_PTR_TYPE(p);
+	if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
+		/* unpack tree and resume search */
+		tree->a[0] = NULL;
+		load_subtree(l, tree, n);
+		free(l);
+		return note_tree_find(tree, n, key_sha1);
+	}
+	return NULL;
 }
 
-static void add_entry(const unsigned char *commit_sha1,
-		const unsigned char *notes_sha1)
+/*
+ * To insert a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth nibble of the key as an index into a:
+ *    - If a[n] is NULL, store the tweaked pointer directly into a[n]
+ *    - If a[n] is an int_node, recurse into that node and increment n
+ *    - If a[n] is a leaf_node:
+ *      1. Check if they're equal, and handle that (abort? overwrite?)
+ *      2. Create a new int_node, and store both leaf_nodes there
+ *      3. Store the new int_node into a[n].
+ */
+static int note_tree_insert(struct int_node *tree, unsigned char n,
+		const struct leaf_node *entry, unsigned char type)
 {
-	int index;
-
-	if (hash_map.count + 1 > hash_map.size >> 1) {
-		int i, old_size = hash_map.size;
-		struct entry *old = hash_map.entries;
-
-		hash_map.size = old_size ? old_size << 1 : 64;
-		hash_map.entries = (struct entry *)
-			xcalloc(sizeof(struct entry), hash_map.size);
-
-		for (i = 0; i < old_size; i++)
-			if (!is_null_sha1(old[i].commit_sha1)) {
-				index = -1 - hash_index(&hash_map,
-						old[i].commit_sha1);
-				memcpy(hash_map.entries + index, old + i,
-					sizeof(struct entry));
-			}
-		free(old);
+	struct int_node *new_node;
+	const struct leaf_node *l;
+	int ret;
+	unsigned char i = GET_NIBBLE(n, entry->key_sha1);
+	void *p = tree->a[i];
+	assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
+	switch(GET_PTR_TYPE(p)) {
+	case PTR_TYPE_NULL:
+		assert(!p);
+		tree->a[i] = SET_PTR_TYPE(entry, type);
+		return 0;
+	case PTR_TYPE_INTERNAL:
+		return note_tree_insert(CLR_PTR_TYPE(p), n + 1, entry, type);
+	default:
+		assert(GET_PTR_TYPE(p) == PTR_TYPE_NOTE ||
+			GET_PTR_TYPE(p) == PTR_TYPE_SUBTREE);
+		l = (const struct leaf_node *) CLR_PTR_TYPE(p);
+		if (!hashcmp(entry->key_sha1, l->key_sha1))
+			return -1; /* abort insert on matching key */
+		new_node = (struct int_node *)
+			xcalloc(sizeof(struct int_node), 1);
+		ret = note_tree_insert(new_node, n + 1,
+			CLR_PTR_TYPE(p), GET_PTR_TYPE(p));
+		if (ret) {
+			free(new_node);
+			return -1;
+		}
+		tree->a[i] = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
+		return note_tree_insert(new_node, n + 1, entry, type);
 	}
+}
 
-	index = hash_index(&hash_map, commit_sha1);
-	if (index < 0) {
-		index = -1 - index;
-		hash_map.count++;
+/* Free the entire notes data contained in the given tree */
+static void note_tree_free(struct int_node *tree)
+{
+	unsigned int i;
+	for (i = 0; i < 16; i++) {
+		void *p = tree->a[i];
+		switch(GET_PTR_TYPE(p)) {
+		case PTR_TYPE_INTERNAL:
+			note_tree_free(CLR_PTR_TYPE(p));
+			/* fall through */
+		case PTR_TYPE_NOTE:
+		case PTR_TYPE_SUBTREE:
+			free(CLR_PTR_TYPE(p));
+		}
 	}
+}
 
-	hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
-	hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+/*
+ * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
+ * - hex      - Partial SHA1 segment in ASCII hex format
+ * - hex_len  - Length of above segment. Must be multiple of 2 between 0 and 40
+ * - sha1     - Partial SHA1 value is written here
+ * - sha1_len - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
+ * Returns -1 on error (invalid arguments or invalid SHA1 (not in hex format).
+ * Otherwise, returns number of bytes written to sha1 (i.e. hex_len / 2).
+ * Pads sha1 with NULs up to sha1_len (not included in returned length).
+ */
+static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
+		unsigned char *sha1, unsigned int sha1_len)
+{
+	unsigned int i, len = hex_len >> 1;
+	if (hex_len % 2 != 0 || len > sha1_len)
+		return -1;
+	for (i = 0; i < len; i++) {
+		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+		if (val & ~0xff)
+			return -1;
+		*sha1++ = val;
+		hex += 2;
+	}
+	for (; i < sha1_len; i++)
+		*sha1++ = 0;
+	return len;
 }
 
-static void initialize_hash_map(const char *notes_ref_name)
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+		unsigned int n)
 {
-	unsigned char sha1[20], commit_sha1[20];
-	unsigned mode;
+	unsigned char commit_sha1[20];
+	unsigned int prefix_len;
+	int status;
+	void *buf;
 	struct tree_desc desc;
 	struct name_entry entry;
-	void *buf;
+
+	buf = fill_tree_descriptor(&desc, subtree->val_sha1);
+	if (!buf)
+		die("Could not read %s for notes-index",
+		     sha1_to_hex(subtree->val_sha1));
+
+	prefix_len = subtree->key_sha1[19];
+	assert(prefix_len * 2 >= n);
+	memcpy(commit_sha1, subtree->key_sha1, prefix_len);
+	while (tree_entry(&desc, &entry)) {
+		int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
+				commit_sha1 + prefix_len, 20 - prefix_len);
+		if (len < 0)
+			continue; /* entry.path is not a SHA1 sum. Skip */
+		len += prefix_len;
+
+		/*
+		 * If commit SHA1 is complete (len == 20), assume note object
+		 * If commit SHA1 is incomplete (len < 20), assume note subtree
+		 */
+		if (len <= 20) {
+			unsigned char type = PTR_TYPE_NOTE;
+			struct leaf_node *l = (struct leaf_node *)
+				xcalloc(sizeof(struct leaf_node), 1);
+			hashcpy(l->key_sha1, commit_sha1);
+			hashcpy(l->val_sha1, entry.sha1);
+			if (len < 20) {
+				l->key_sha1[19] = (unsigned char) len;
+				type = PTR_TYPE_SUBTREE;
+			}
+			status = note_tree_insert(node, n, l, type);
+			assert(!status);
+		}
+	}
+	free(buf);
+}
+
+static void initialize_notes(const char *notes_ref_name)
+{
+	unsigned char sha1[20], commit_sha1[20];
+	unsigned mode;
+	struct leaf_node root_tree;
 
 	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
 	    get_tree_entry(commit_sha1, "", sha1, &mode))
 		return;
 
-	buf = fill_tree_descriptor(&desc, sha1);
-	if (!buf)
-		die("Could not read %s for notes-index", sha1_to_hex(sha1));
-
-	while (tree_entry(&desc, &entry))
-		if (!get_sha1(entry.path, commit_sha1))
-			add_entry(commit_sha1, entry.sha1);
-	free(buf);
+	hashclr(root_tree.key_sha1);
+	hashcpy(root_tree.val_sha1, sha1);
+	load_subtree(&root_tree, &root_node, 0);
 }
 
 static unsigned char *lookup_notes(const unsigned char *commit_sha1)
 {
-	int index;
-
-	if (!hash_map.size)
-		return NULL;
-
-	index = hash_index(&hash_map, commit_sha1);
-	if (index < 0)
-		return NULL;
-	return hash_map.entries[index].notes_sha1;
+	struct leaf_node *found = note_tree_find(&root_node, 0, commit_sha1);
+	if (found)
+		return found->val_sha1;
+	return NULL;
 }
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
@@ -120,7 +299,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
 		else if (!notes_ref_name)
 			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		initialize_hash_map(notes_ref_name);
+		initialize_notes(notes_ref_name);
 		initialized = 1;
 	}
 
@@ -163,7 +342,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 
 void free_commit_notes()
 {
-	free(hash_map.entries);
-	memset(&hash_map, 0, sizeof(struct hash_map));
+	note_tree_free(&root_node);
+	memset(&root_node, 0, sizeof(struct int_node));
 	initialized = 0;
 }
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts
  2009-09-12 15:52   ` Johan Herland
                       ` (11 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 16:08     ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
  2009-09-12 16:08     ` [PATCHv6 14/14] Add test cases for various date-based fanouts Johan Herland
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Add selftests verifying:
- that we are able to parse notes trees with various fanout schemes
- that notes trees with conflicting fanout schemes are parsed as expected

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3303-notes-subtrees.sh |  137 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 137 insertions(+), 0 deletions(-)
 create mode 100755 t/t3303-notes-subtrees.sh

diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
new file mode 100755
index 0000000..d24203e
--- /dev/null
+++ b/t/t3303-notes-subtrees.sh
@@ -0,0 +1,137 @@
+#!/bin/sh
+
+test_description='Test commit notes organized in subtrees'
+
+. ./test-lib.sh
+
+number_of_commits=100
+
+start_note_commit () {
+	test_tick &&
+	cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+from refs/notes/commits^0
+deleteall
+INPUT_END
+
+}
+
+verify_notes () {
+	git log | grep "^    " > output &&
+	i=$number_of_commits &&
+	while [ $i -gt 0 ]; do
+		echo "    commit #$i" &&
+		echo "    note for commit #$i" &&
+		i=$(($i-1));
+	done > expect &&
+	test_cmp expect output
+}
+
+test_expect_success "setup: create $number_of_commits commits" '
+
+	(
+		nr=0 &&
+		while [ $nr -lt $number_of_commits ]; do
+			nr=$(($nr+1)) &&
+			test_tick &&
+			cat <<INPUT_END
+commit refs/heads/master
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+		done &&
+		test_tick &&
+		cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+no notes
+COMMIT
+
+deleteall
+
+INPUT_END
+
+	) |
+	git fast-import --quiet &&
+	git config core.notesRef refs/notes/commits
+'
+
+test_sha1_based () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			note_path=$(echo "$sha1" | sed "$1")
+			cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test notes in 2/38-fanout' 'test_sha1_based "s|^..|&/|"'
+test_expect_success 'verify notes in 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 4/36-fanout' 'test_sha1_based "s|^....|&/|"'
+test_expect_success 'verify notes in 4/36-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/2/36-fanout' 'test_sha1_based "s|^\(..\)\(..\)|\1/\2/|"'
+test_expect_success 'verify notes in 2/2/36-fanout' 'verify_notes'
+
+test_preferred () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			preferred_note_path=$(echo "$sha1" | sed "$1")
+			ignored_note_path=$(echo "$sha1" | sed "$2")
+			cat <<INPUT_END &&
+M 100644 inline $ignored_note_path
+data <<EOF
+IGNORED note for commit #$nr
+EOF
+
+M 100644 inline $preferred_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test notes in 4/36-fanout overriding 2/38-fanout' 'test_preferred "s|^....|&/|" "s|^..|&/|"'
+test_expect_success 'verify notes in 4/36-fanout overriding 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/38-fanout overriding 2/2/36-fanout' 'test_preferred "s|^..|&/|" "s|^\(..\)\(..\)|\1/\2/|"'
+test_expect_success 'verify notes in 2/38-fanout overriding 2/2/36-fanout' 'verify_notes'
+
+test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1
  2009-09-12 15:52   ` Johan Herland
                       ` (12 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  2009-09-12 18:41       ` Junio C Hamano
  2009-09-12 16:08     ` [PATCHv6 14/14] Add test cases for various date-based fanouts Johan Herland
  14 siblings, 1 reply; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

This is a major expansion of the notes lookup code to allow for variations
in the notes tree organization. The variations allowed include mixing fanout
schemes based on the commit dates of the annotated commits (aka. date-based
fanout) with fanout schemes based on the SHA1 of the annotated commits (aka.
SHA1-based fanout).

Using date-based fanout in the notes tree structure enables considerable
speedup in the notes lookup process, since notes are almost always looked up
sequentially in the (reverse) chronological order of their associated commits.
Furthermore, organizing notes in a way that allow (near) sequential lookup,
enables us to decrease memory consumption both by lazily loading parts of the
notes tree structure on-demand, and freeing parts of the notes structure that
are unlikely to be used again soon.

The new flexible organization of the notes tree changes the rules for valid
note tree entries. The new rules are as follows:

1. Note objects are named by the SHA1 of the commit they annotate, possibly
   split across several SHA1-based fanout levels (this is the same as is
   implemented earlier in this series).

2. Note entries are located within zero or more date-based fanout levels.

3. Date-based fanout schemes may use the year, month and day values of the
   associated commit's timestamp. The values must be prefixed by 'y', 'm'
   and 'd' (respectively) in the notes tree.

4. The date-based components can be combined in one fanout level, or split
   across multiple fanout levels. Individual components may not be split
   across multiple fanout levels.

5. The year/month/date values must be specified in that order, and month or
   date values may not occur without the preceding year or month value.

6. All entries of a tree object in the notes tree structure must follow the
   same scheme used at that level.

Thus, the following example note entries are all valid locations for a note
annotating commit 123456789abcdef0123456789abcdef0123456789 at 2009-09-01:
- 123456789abcdef0123456789abcdef0123456789
- 12/3456789abcdef0123456789abcdef0123456789
- 1234/56789abcdef0123456789abcdef0123456789
- 12/34/56789abcdef0123456789abcdef0123456789
- 1234/5678/9abcdef0123456789abcdef0123456789
- 1234/56/78/9abcdef0123456789abcdef0123456789
- y2009/123456789abcdef0123456789abcdef0123456789
- y2009/m09/12/3456789abcdef0123456789abcdef0123456789
- y2009/m09/d01/123456789abcdef0123456789abcdef0123456789
- y2009m09/12/34/56789abcdef0123456789abcdef0123456789
- y2009m09/d01/1234/567/89abcdef0123456789abcdef0123456789
- y2009/m09d01/12/34/56/78/9abcdef0123456789abcdef0123456789
- y2009m09d01/123456789abcdef0123456789abcdef0123456789

Conversely, the following example note entries are all invalid:
- 1/23456789abcdef0123456789abcdef0123456789 (violates #1)
- 123/456789abcdef0123456789abcdef0123456789 (violates #1)
- 12/345/6789abcdef0123456789abcdef0123456789 (violates #1)
- y2009123456789abcdef0123456789abcdef0123456789 (violates #2)
- 2009/09/01/123456789abcdef0123456789abcdef0123456789 (violates #3)
- y20/09/m09/12/3456789abcdef0123456789abcdef0123456789 (violates #4)
- y20/09m09/d01/123456789abcdef0123456789abcdef0123456789 (violates #4)
- y2009m/09/12/34/56789abcdef0123456789abcdef0123456789 (violates #4)
- y2009/d01/1234/5678/9abcdef0123456789abcdef0123456789 (violates #5)
- m09/y2009/d01/12/34/56/78/9abcdef0123456789abcdef0123456789 (violates #5)

>From rule #6, we see that the following example notes tree is valid:
- y2009m09/0123456789abcdef0123456789abcdef012345678
- y2009m09/123456789abcdef0123456789abcdef0123456789
- y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
- y2008m01/d31/34/56789abcdef0123456789abcdef0123456789ab
- y2008m01/d16/4567/89abcdef0123456789abcdef0123456789abc
- y2008m01/d16/5678/9abcdef0123456789abcdef0123456789abcd

Conversely the following structure is invalid (violates rule #6):
- y2009m09/0123456789abcdef0123456789abcdef012345678
- y2009m09/12/3456789abcdef0123456789abcdef0123456789
- y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
- y2008m01/34/56789abcdef0123456789abcdef0123456789ab
- y2008m01/d16/45/6789abcdef0123456789abcdef0123456789abc
- y2008/m01d16/5678/9abcdef0123456789abcdef0123456789abcd

The flexibility added by this patch adds considerable complexity to the notes
tree parser, but the runtime and memory usage is not significantly affected
(except for the effects introduced by the chosen notes tree structure).

Internally, the 16-tree data structure introduced in earlier patches is still
used to hold the SHA1-based fanout levels and the note entries themselves.
However, this patch adds a hierarchical date-based linked-list structure
around the 16-tree structure that mirrors the fanout scheme used in the
actual notes tree.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  403 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 364 insertions(+), 39 deletions(-)

diff --git a/notes.c b/notes.c
index 6926aa6..a3e3f83 100644
--- a/notes.c
+++ b/notes.c
@@ -7,6 +7,70 @@
 #include "tree-walk.h"
 
 /*
+ * Format of entries in the notes tree structure:
+ *
+ * note-entry   ::= (period sep)? sha1-spec
+ * period       ::= year sep?
+ *                  (month sep?
+ *                   (date sep?)?
+ *                  )?;
+ * year         ::= 'y' yearnum;
+ * month        ::= 'm' monthnum;
+ * date         ::= 'd' datenum;
+ * yearnum      ::= # 4-digit decimal year, from annotated commit's timestamp;
+ * monthnum     ::= # 2-digit decimal month, from annotated commit's timestamp;
+ * datenum      ::= # 2-digit decimal date, from annotated commit's timestamp;
+ * sha1-spec    ::= (hex-fragment sep?){20}
+ * sep          ::= '/';
+ * hex-fragment ::= # Fragment of hexsha1 (2 bytes);
+ * hexsha1      ::= # SHA1 of annotated commit in hex format (40 bytes);
+ *
+ * Thus, the following example note entries are all valid:
+ * - 0123456789abcdef0123456789abcdef012345678
+ * - 01/23456789abcdef0123456789abcdef012345678
+ * - 0123/456789abcdef0123456789abcdef012345678
+ * - 01/23/456789abcdef0123456789abcdef012345678
+ * - 0123/4567/89abcdef0123456789abcdef012345678
+ * - 0123/45/67/89abcdef0123456789abcdef012345678
+ * - y2009/0123456789abcdef0123456789abcdef012345678
+ * - y2009/m09/01/23456789abcdef0123456789abcdef012345678
+ * - y2009/m09/d01/0123456789abcdef0123456789abcdef012345678
+ * - y2009m09/01/23/456789abcdef0123456789abcdef012345678
+ * - y2009m09/d01/0123/4567/89abcdef0123456789abcdef012345678
+ * - y2009/m09d01/01/23/45/67/89abcdef0123456789abcdef012345678
+ * - y2009m09d01/0123456789abcdef0123456789abcdef012345678
+ *
+ * and the following example note entries are all invalid:
+ * - 0/123456789abcdef0123456789abcdef012345678
+ * - 012/3456789abcdef0123456789abcdef012345678
+ * - 01/234/56789abcdef0123456789abcdef012345678
+ * - y20090123456789abcdef0123456789abcdef012345678
+ * - y20/09/m09/01/23456789abcdef0123456789abcdef012345678
+ * - y20/09m09/d01/0123456789abcdef0123456789abcdef012345678
+ * - y2009m/09/01/23/456789abcdef0123456789abcdef012345678
+ * - y2009/d01/0123/4567/89abcdef0123456789abcdef012345678
+ * - m09/y2009/d01/01/23/45/67/89abcdef0123456789abcdef012345678
+ *
+ * In addition to the above per-entry rules, we require that _all_ entries at
+ * a given level in the notes tree (levels are separated by '/') follow the
+ * exact same format at that level. Thus the following structure is valid:
+ * - y2009m09/0123456789abcdef0123456789abcdef012345678
+ * - y2009m09/123456789abcdef0123456789abcdef0123456789
+ * - y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
+ * - y2008m01/d31/34/56789abcdef0123456789abcdef0123456789ab
+ * - y2008m01/d16/4567/89abcdef0123456789abcdef0123456789abc
+ * - y2008m01/d16/5678/9abcdef0123456789abcdef0123456789abcd
+ *
+ * but the following structure is invalid:
+ * - y2009m09/0123456789abcdef0123456789abcdef012345678
+ * - y2009m09/12/3456789abcdef0123456789abcdef0123456789
+ * - y2008m01/d31/23/456789abcdef0123456789abcdef0123456789a
+ * - y2008m01/34/56789abcdef0123456789abcdef0123456789ab
+ * - y2008m01/d16/45/6789abcdef0123456789abcdef0123456789abc
+ * - y2008/m01d16/5678/9abcdef0123456789abcdef0123456789abcd
+ */
+
+/*
  * Use a non-balancing simple 16-tree structure with struct int_node as
  * internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
  * 16-array of pointers to its children.
@@ -17,9 +81,45 @@
  * - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
  *
  * The root node is a statically allocated struct int_node.
+ *
+ * In order to allow date-based fanout schemes in addition to the original
+ * SHA1-based fanout schemes, we need to overload this structure, as follows:
+ * If the first pointer in the 16-array is ~0 (i.e. 0xffffffff on 32-bit
+ * systems and 0xffffffffffffffff on 64-bit systems), then the int_node is NOT
+ * to be interpreted as a 16-array of child node pointers. Rather, the int_node
+ * now represents a period-based node with the following properties:
+ * - The node has a pointer to a "child" node of type struct int_node, which is
+ *   EITHER a "regular" int_node object representing the root node of a 16-tree
+ *   structure holding notes associated with commits with timestamps within
+ *   that time period, OR another period-based int_node representing some
+ *   subdivision of the time period.
+ * - The node also has a pointer to a "previous" period-based int_node, which
+ *   represents the previous time period for which there exist note objects.
+ * - The node has a pointer to a "parent" node, which is the period-based
+ *   int_node that has this int_node as one of its children. This is needed
+ *   when traversing the date-based int_nodes looking for a period matching the
+ *   given commit. For top-level objects, this is set to NULL.
+ * - The node stores the SHA1 sum of the tree object that represents its child
+ *   (within the notes tree structure). Thus, we keep a reference to the child
+ *   structure that without necessarily allocating the child node (and
+ *   underlying structure).
+ * - Finally, the node has a period string, which indicates the time period of
+ *   the notes contained within, typically of the form "YYYY", "YYYY-MM" or
+ *   "YYYY-MM-DD", depending on the granularity of the corresponding
+ *   period-based entries in the notes tree structure.
  */
 struct int_node {
-	void *a[16];
+	union {
+		void *a[16];
+		struct {
+			void *magic;  /* ~0 "enables" this part of the union */
+			struct int_node *child;
+			struct int_node *prev;
+			struct int_node *parent;
+			unsigned char tree_sha1[20];
+			char period[11];  /* Enough to hold "YYYY-MM-DD" */
+		};
+	};
 };
 
 /*
@@ -51,12 +151,18 @@ struct leaf_node {
 #define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
 	(memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
 
+#define SUBTREE_DATE_PREFIXCMP(commit_date, subtree_date) \
+	(prefixcmp(commit_date, subtree_date))
+
 static struct int_node root_node;
 
+static struct int_node *cur_node;
+
 static int initialized;
 
-static void load_subtree(struct leaf_node *subtree, struct int_node *node,
-		unsigned int n);
+static void load_subtree(const unsigned char *sha1,
+		const unsigned char *prefix, unsigned int prefix_len,
+		struct int_node *node, struct int_node *parent, int n);
 
 /*
  * To find a leaf_node:
@@ -94,7 +200,8 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 		if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
 			/* unpack tree and resume search */
 			tree->a[i] = NULL;
-			load_subtree(l, tree, n);
+			load_subtree(l->val_sha1, l->key_sha1, l->key_sha1[19],
+				     tree, NULL, (int) n);
 			free(l);
 			return note_tree_find(tree, n, key_sha1);
 		}
@@ -117,7 +224,8 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 	if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
 		/* unpack tree and resume search */
 		tree->a[0] = NULL;
-		load_subtree(l, tree, n);
+		load_subtree(l->val_sha1, l->key_sha1, l->key_sha1[19], tree,
+			     NULL, (int) n);
 		free(l);
 		return note_tree_find(tree, n, key_sha1);
 	}
@@ -173,16 +281,28 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
 /* Free the entire notes data contained in the given tree */
 static void note_tree_free(struct int_node *tree)
 {
-	unsigned int i;
-	for (i = 0; i < 16; i++) {
-		void *p = tree->a[i];
-		switch(GET_PTR_TYPE(p)) {
-		case PTR_TYPE_INTERNAL:
-			note_tree_free(CLR_PTR_TYPE(p));
-			/* fall through */
-		case PTR_TYPE_NOTE:
-		case PTR_TYPE_SUBTREE:
-			free(CLR_PTR_TYPE(p));
+	if (tree->magic == (void *) ~0) {
+		if (tree->prev) {
+			note_tree_free(tree->prev);
+			free(tree->prev);
+		}
+		if (tree->child) {
+			note_tree_free(tree->child);
+			free(tree->child);
+		}
+	}
+	else {
+		unsigned int i;
+		for (i = 0; i < 16; i++) {
+			void *p = tree->a[i];
+			switch(GET_PTR_TYPE(p)) {
+			case PTR_TYPE_INTERNAL:
+				note_tree_free(CLR_PTR_TYPE(p));
+				/* fall through */
+			case PTR_TYPE_NOTE:
+			case PTR_TYPE_SUBTREE:
+				free(CLR_PTR_TYPE(p));
+			}
 		}
 	}
 }
@@ -215,29 +335,139 @@ static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
 	return len;
 }
 
-static void load_subtree(struct leaf_node *subtree, struct int_node *node,
-		unsigned int n)
+/*
+ * Parse year/month/date strings, and generate the corresponding period string
+ * for the given path entry:
+ * - prefix must follow one of these forms: "", "YYYY", "YYYY-MM"
+ * - path should follow one of these forms: "yYYYY", "yYYYYmMM", "yYYYYmMMdDD",
+ *   "mMMdDD", "mMM" or "dDD"
+ * The resulting string (which follows the form "YYYY", "YYYY-MM" or
+ * "YYYY-MM-DD") is returned as a static string. If path is not valid in the
+ * given (prefix) context, NULL is returned.
+ */
+static const char *parse_period(const char *prefix, unsigned int prefix_len,
+		const char *path, unsigned int path_len)
+{
+	static char result[11];
+	char expect_type;  /* y/m/d for year/month/day-based fanout */
+	unsigned int expect_len, value;
+	char *endptr, *target = result;
+
+	switch (prefix_len) {
+	case 0:
+		/* No prefix, expect year-based fanout in path */
+		expect_type = 'y';
+		expect_len = 4;
+		break;
+	case 4:
+		/* Year in prefix, expect month-based fanout in path */
+		expect_type = 'm';
+		expect_len = 2;
+		break;
+	case 7:
+		/* "YYYY-MM" in prefix, expect day-based fanout in path */
+		expect_type = 'd';
+		expect_len = 2;
+		break;
+	default:
+		die("Date-based notes tree loading invoked with invalid "
+		    "prefix '%.*s'", prefix_len, prefix);
+	}
+
+	if (path[0] != expect_type) {
+		warning("Unexpected entry path in date-based notes tree: '%s' "
+			"(skipping)", path);
+		return NULL;
+	}
+	value = (unsigned int) strtoul(path + 1, &endptr, 10);
+	switch (expect_type) {
+	case 'y':
+		if (value < 1969 || value >= 3000) {
+			warning("Invalid year value in date-based notes tree:"
+				" '%s' (skipping)", path);
+			return NULL;
+		}
+		break;
+	case 'm':
+		if (value < 1 || value > 12) {
+			warning("Invalid month value in date-based notes tree:"
+				" '%s' (skipping)", path);
+			return NULL;
+		}
+		break;
+	case 'd':
+		if (value < 1 || value > 31) {
+			warning("Invalid day value in date-based notes tree:"
+				" '%s' (skipping)", path);
+			return NULL;
+		}
+		break;
+	}
+
+	if (prefix == result) {
+		target = result + prefix_len;
+		prefix = NULL;
+		prefix_len = 0;
+	}
+	prefix_len = snprintf(target, 11, "%.*s%s%0*u", prefix_len, prefix,
+			      expect_len == 2 ? "-" : "", expect_len, value);
+	prefix_len += target - result;
+	assert(prefix_len < 11);
+
+	if (*endptr)  /* there are more components in this path */
+		return parse_period(result, prefix_len, endptr,
+				    path_len - (endptr - path));
+	return result;
+}
+
+static void load_date_subtree(struct tree_desc *tree_desc,
+		const char *prefix, unsigned int prefix_len,
+		struct int_node *node, struct int_node *parent)
+{
+	struct name_entry entry;
+	struct int_node *cur_node = NULL;
+	struct int_node *new_node;
+
+	while (tree_entry(tree_desc, &entry)) {
+		const char *period = parse_period(
+			prefix, prefix_len, entry.path, strlen(entry.path));
+		if (!period)
+			continue;
+		if (tree_desc->size)  /* this is not the last tree entry */
+			new_node = (struct int_node *)
+				xmalloc(sizeof(struct int_node));
+		else  /* this is the last entry, store directly into node */
+			new_node = node;
+
+		new_node->magic = (void *) ~0;
+		new_node->child = NULL;
+		new_node->prev = cur_node;
+		new_node->parent = parent;
+		hashcpy(new_node->tree_sha1, entry.sha1);
+		strcpy(new_node->period, period);
+		cur_node = new_node;
+	}
+	assert(!cur_node || cur_node == node);
+}
+
+static void load_sha1_subtree(struct tree_desc *tree_desc,
+		const unsigned char *prefix, unsigned int prefix_len,
+		struct int_node *node, unsigned char n)
 {
 	unsigned char commit_sha1[20];
-	unsigned int prefix_len;
 	int status;
-	void *buf;
-	struct tree_desc desc;
 	struct name_entry entry;
 
-	buf = fill_tree_descriptor(&desc, subtree->val_sha1);
-	if (!buf)
-		die("Could not read %s for notes-index",
-		     sha1_to_hex(subtree->val_sha1));
-
-	prefix_len = subtree->key_sha1[19];
 	assert(prefix_len * 2 >= n);
-	memcpy(commit_sha1, subtree->key_sha1, prefix_len);
-	while (tree_entry(&desc, &entry)) {
+	memcpy(commit_sha1, prefix, prefix_len);
+	while (tree_entry(tree_desc, &entry)) {
 		int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
 				commit_sha1 + prefix_len, 20 - prefix_len);
-		if (len < 0)
+		if (len < 0) {
+			warning("Invalid value in notes tree: '%s' (skipping)",
+				entry.path);
 			continue; /* entry.path is not a SHA1 sum. Skip */
+		}
 		len += prefix_len;
 
 		/*
@@ -258,6 +488,42 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
 			assert(!status);
 		}
 	}
+}
+
+static void load_subtree(const unsigned char *sha1,
+		const unsigned char *prefix, unsigned int prefix_len,
+		struct int_node *node, struct int_node *parent, int n)
+{
+	void *buf;
+	struct tree_desc desc;
+
+	buf = fill_tree_descriptor(&desc, sha1);
+	if (!buf)
+		die("Could not read notes subtree at %s", sha1_to_hex(sha1));
+	/*
+	 * After fill_tree_descriptor(), we can peek at the first tree entry
+	 * in desc.entry.
+	 */
+	switch (desc.entry.path[0]) {
+	case 'd':
+		if (strlen(desc.entry.path) != 3)
+			break;
+		/* fall-through */
+	case 'm':
+	case 'y':
+		/* path cannot be a SHA1 fragment */
+		load_date_subtree(&desc, (const char *) prefix, prefix_len,
+				  node, parent);
+		free(buf);
+		return;
+	}
+	if (n < 0) {
+		/* Arriving from a date-based subtree; reset prefix */
+		n = 0;
+		prefix = NULL;
+		prefix_len = 0;
+	}
+	load_sha1_subtree(&desc, prefix, prefix_len, node, n);
 	free(buf);
 }
 
@@ -265,23 +531,81 @@ static void initialize_notes(const char *notes_ref_name)
 {
 	unsigned char sha1[20], commit_sha1[20];
 	unsigned mode;
-	struct leaf_node root_tree;
 
 	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
 	    get_tree_entry(commit_sha1, "", sha1, &mode))
 		return;
 
-	hashclr(root_tree.key_sha1);
-	hashcpy(root_tree.val_sha1, sha1);
-	load_subtree(&root_tree, &root_node, 0);
+	load_subtree(sha1, NULL, 0, &root_node, NULL, 0);
+	cur_node = &root_node;
 }
 
-static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+static unsigned char *lookup_notes(const struct commit *commit)
 {
-	struct leaf_node *found = note_tree_find(&root_node, 0, commit_sha1);
-	if (found)
-		return found->val_sha1;
-	return NULL;
+	struct int_node *node = cur_node, *seen_node = cur_node;
+	struct leaf_node *found;
+	const char *short_date;
+
+	if (!node)
+		return NULL;
+
+	/* Convert commit->date to YYYY-MM-DD format */
+	short_date = show_date(commit->date, 0, DATE_SHORT);
+
+	while (node->magic == (void *) ~0) {  /* date-based node */
+		int cmp = SUBTREE_DATE_PREFIXCMP(short_date, node->period);
+		if (cmp == 0) {
+			/* Search inside child node */
+			if (!node->child) {
+				/* Must unpack child node first */
+				node->child = (struct int_node *)
+					xcalloc(sizeof(struct int_node), 1);
+				load_subtree(node->tree_sha1,
+					(const unsigned char *) node->period,
+					strlen(node->period), node->child,
+					node, -1);
+			}
+			seen_node = node;
+			node = node->child;
+		}
+		else if (cmp > 0) {
+			/* Search in past node */
+			if (node->prev)
+				node = node->prev;
+			else
+				node = node->parent;
+		}
+		else {
+			/* Search in future node */
+			if (!node->parent) {
+				/* Restart from root_node */
+				seen_node = node;
+				node = &root_node;
+			}
+			else
+				node = node->parent;
+		}
+		if (!node || node == seen_node) {
+			/* We've been here before, give up search */
+			return NULL;
+		}
+	}
+	while (cur_node &&
+	       SUBTREE_DATE_PREFIXCMP(cur_node->period, seen_node->period) < 0)
+	{
+		/*
+		 * We're about to move cur_node backwards in history. We are
+		 * unlikely to need this cur_node in the future, so free() it.
+		 */
+		note_tree_free(cur_node->child);
+		cur_node->child = NULL;
+		cur_node = cur_node->parent;
+	}
+	cur_node = seen_node;
+
+	/* Drill down further with SHA1-based lookup */
+	found = note_tree_find(node, 0, commit->object.sha1);
+	return found ? found->val_sha1 : NULL;
 }
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
@@ -303,7 +627,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		initialized = 1;
 	}
 
-	sha1 = lookup_notes(commit->object.sha1);
+	sha1 = lookup_notes(commit);
 	if (!sha1)
 		return;
 
@@ -342,6 +666,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 
 void free_commit_notes()
 {
+	cur_node = NULL;
 	note_tree_free(&root_node);
 	memset(&root_node, 0, sizeof(struct int_node));
 	initialized = 0;
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCHv6 14/14] Add test cases for various date-based fanouts
  2009-09-12 15:52   ` Johan Herland
                       ` (13 preceding siblings ...)
  2009-09-12 16:08     ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
@ 2009-09-12 16:08     ` Johan Herland
  14 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 16:08 UTC (permalink / raw)
  To: gitster
  Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
	spearce

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3303-notes-subtrees.sh |   64 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
index d24203e..d3cbf6d 100755
--- a/t/t3303-notes-subtrees.sh
+++ b/t/t3303-notes-subtrees.sh
@@ -134,4 +134,68 @@ test_expect_success 'verify notes in 4/36-fanout overriding 2/38-fanout' 'verify
 test_expect_success 'test notes in 2/38-fanout overriding 2/2/36-fanout' 'test_preferred "s|^..|&/|" "s|^\(..\)\(..\)|\1/\2/|"'
 test_expect_success 'verify notes in 2/38-fanout overriding 2/2/36-fanout' 'verify_notes'
 
+test_date_based () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git log --format="%H %ct" refs/heads/master |
+		while read sha1 date_t; do
+			date=$(date -u -d "@$date_t" +"$1")
+			note_path="$date/$(echo "$sha1" | sed "$2")"
+			cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test notes in y/40-fanout' 'test_date_based "y%Y" ""'
+test_expect_success 'verify notes in y/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/2/38-fanout' 'test_date_based "y%Y" "s|^..|&/|"'
+test_expect_success 'verify notes in y/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/40-fanout' 'test_date_based "y%Ym%m" ""'
+test_expect_success 'verify notes in ym/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/2/38-fanout' 'test_date_based "y%Ym%m" "s|^..|&/|"'
+test_expect_success 'verify notes in ym/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ymd/40-fanout' 'test_date_based "y%Ym%md%d" ""'
+test_expect_success 'verify notes in ymd/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ymd/2/38-fanout' 'test_date_based "y%Ym%md%d" "s|^..|&/|"'
+test_expect_success 'verify notes in ymd/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/40-fanout' 'test_date_based "y%Y/m%m" ""'
+test_expect_success 'verify notes in y/m/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/2/38-fanout' 'test_date_based "y%Y/m%m" "s|^..|&/|"'
+test_expect_success 'verify notes in y/m/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/md/40-fanout' 'test_date_based "y%Y/m%md%d" ""'
+test_expect_success 'verify notes in y/md/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/md/2/38-fanout' 'test_date_based "y%Y/m%md%d" "s|^..|&/|"'
+test_expect_success 'verify notes in y/md/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/d/40-fanout' 'test_date_based "y%Ym%m/d%d" ""'
+test_expect_success 'verify notes in ym/d/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in ym/d/2/38-fanout' 'test_date_based "y%Ym%m/d%d" "s|^..|&/|"'
+test_expect_success 'verify notes in ym/d/2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/d/40-fanout' 'test_date_based "y%Y/m%m/d%d" ""'
+test_expect_success 'verify notes in y/m/d/40-fanout' 'verify_notes'
+
+test_expect_success 'test notes in y/m/d/2/38-fanout' 'test_date_based "y%Y/m%m/d%d" "s|^..|&/|"'
+test_expect_success 'verify notes in y/m/d/2/38-fanout' 'verify_notes'
+
 test_done
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-12 15:50   ` Johan Herland
@ 2009-09-12 18:11     ` Shawn O. Pearce
  2009-09-12 18:35       ` Johan Herland
  0 siblings, 1 reply; 58+ messages in thread
From: Shawn O. Pearce @ 2009-09-12 18:11 UTC (permalink / raw)
  To: Johan Herland
  Cc: git, gitster, Johannes.Schindelin, trast, tavestbo, git, chriscool

Johan Herland <johan@herland.net> wrote:
> Shawn, do you have any additional defence for the date-based fanout?

No.

The only defense I have for it is "it sounds like a nice theory
given access patterns", and the note about memory usage you made,
but which I clipped to keep this email shorter. :-)

It was only a theory I tossed out there in a back-seat-driver
sort of way.  Your results show my hunch was correct, it may help.
But they also say it may not help enough to justify the complexity,
so I now agree with you that SHA-1 fan out may be good enough.

> Are 
> there untested reasonable scenarios that would show the benefits of date-
> based fanout?

I don't think there are, your tests were pretty good at covering
things.

> How does the plan for notes usage in your code-review thingy 
> compare to my test scenario?

I think your tests may still have been too low in volume, 115k notes
isn't a lot.  Based on the distributions I was looking at before,
I could be seeing a growth of >100k notes/year.  Ask me again in
5 years if 115k notes is a lot. :-)

But we all know that SHA-1 distributes data quite well, so the SHA-1
fan-out may just need to change from 2_38 to 2_2_2_34 (or something)
to handle that larger volume.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv5 00/14] git notes
  2009-09-12 18:11     ` Shawn O. Pearce
@ 2009-09-12 18:35       ` Johan Herland
  0 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 18:35 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: git, gitster, Johannes.Schindelin, trast, tavestbo, git, chriscool

On Saturday 12 September 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland.net> wrote:
> > Shawn, do you have any additional defence for the date-based fanout?
> 
> No.
> 
> The only defense I have for it is "it sounds like a nice theory
> given access patterns", and the note about memory usage you made,
> but which I clipped to keep this email shorter. :-)
> 
> It was only a theory I tossed out there in a back-seat-driver
> sort of way.  Your results show my hunch was correct, it may help.
> But they also say it may not help enough to justify the complexity,
> so I now agree with you that SHA-1 fan out may be good enough.

Ok, so I guess we can drop the flexible part of notes code. Junio: Feel free 
to drop the two last patches from the jh/notes series.

> > How does the plan for notes usage in your code-review thingy
> > compare to my test scenario?
> 
> I think your tests may still have been too low in volume, 115k notes
> isn't a lot.  Based on the distributions I was looking at before,
> I could be seeing a growth of >100k notes/year.  Ask me again in
> 5 years if 115k notes is a lot. :-)
> 
> But we all know that SHA-1 distributes data quite well, so the SHA-1
> fan-out may just need to change from 2_38 to 2_2_2_34 (or something)
> to handle that larger volume.

Yes, I expect that the optimal number of entries per tree level is ~256, so 
if we add an upper threshold at ~300 (where we start using another fanout 
level), and a lower threshold at ~200 (where we consolidate subtrees and put 
all into this level), the (still-to-be-written) writing part of the notes 
code should automatically adjust the notes tree to the optimal layout.

With those assumptions, and a growth of 100k notes/year, a 2/2/36 fanout 
should last you ~150 years, and a 2/2/2/34 fanout should be enough for the 
next ~40,000 years... ;)


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv6 10/14] Teach notes code to free its internal data structures on request.
  2009-09-12 16:08     ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
@ 2009-09-12 18:40       ` Junio C Hamano
  2009-09-12 22:21         ` Johan Herland
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2009-09-12 18:40 UTC (permalink / raw)
  To: Johan Herland
  Cc: gitster, git, Johannes.Schindelin, trast, tavestbo, git,
	chriscool, spearce

Johan Herland <johan@herland.net> writes:

> There's no need to be rude to memory-concious callers...

Will squash this in.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Sat, 12 Sep 2009 11:34:24 -0700
Subject: [PATCH] notes.[ch] fixup: avoid old-style declaration

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 notes.c |    2 +-
 notes.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes.c b/notes.c
index 008c3d4..9ed2c87 100644
--- a/notes.c
+++ b/notes.c
@@ -161,7 +161,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 	free(msg);
 }
 
-void free_commit_notes()
+void free_commit_notes(void)
 {
 	free(hash_map.entries);
 	memset(&hash_map, 0, sizeof(struct hash_map));
diff --git a/notes.h b/notes.h
index 41802e5..d1dd1d1 100644
--- a/notes.h
+++ b/notes.h
@@ -7,6 +7,6 @@
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding, int flags);
 
-void free_commit_notes();
+void free_commit_notes(void);
 
 #endif
-- 
1.6.5.rc0.82.g1c5d9

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1
  2009-09-12 16:08     ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
@ 2009-09-12 18:41       ` Junio C Hamano
  2009-09-12 22:33         ` Johan Herland
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2009-09-12 18:41 UTC (permalink / raw)
  To: Johan Herland
  Cc: gitster, git, Johannes.Schindelin, trast, tavestbo, git,
	chriscool, spearce

Johan Herland <johan@herland.net> writes:

> This is a major expansion of the notes lookup code to allow for variations
> in the notes tree organization. The variations allowed include mixing fanout
> schemes based on the commit dates of the annotated commits (aka. date-based
> fanout) with fanout schemes based on the SHA1 of the annotated commits (aka.
> SHA1-based fanout).

Will squash this in.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Sat, 12 Sep 2009 11:36:42 -0700
Subject: [PATCH] notes.[ch] fixup: avoid unnamed struct members

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 notes.c |   82 +++++++++++++++++++++++++++++++-------------------------------
 1 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/notes.c b/notes.c
index c573ffa..9a92853 100644
--- a/notes.c
+++ b/notes.c
@@ -118,8 +118,8 @@ struct int_node {
 			struct int_node *parent;
 			unsigned char tree_sha1[20];
 			char period[11];  /* Enough to hold "YYYY-MM-DD" */
-		};
-	};
+		} s;
+	} u;
 };
 
 /*
@@ -182,7 +182,7 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 {
 	struct leaf_node *l;
 	unsigned char i = GET_NIBBLE(n, key_sha1);
-	void *p = tree->a[i];
+	void *p = tree->u.a[i];
 
 	switch(GET_PTR_TYPE(p)) {
 	case PTR_TYPE_INTERNAL:
@@ -199,7 +199,7 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 		l = (struct leaf_node *) CLR_PTR_TYPE(p);
 		if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
 			/* unpack tree and resume search */
-			tree->a[i] = NULL;
+			tree->u.a[i] = NULL;
 			load_subtree(l->val_sha1, l->key_sha1, l->key_sha1[19],
 				     tree, NULL, (int) n);
 			free(l);
@@ -214,16 +214,16 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 
 	/*
 	 * Did not find key at this (or any lower) level.
-	 * Check if there's a matching subtree entry in tree->a[0].
+	 * Check if there's a matching subtree entry in tree->u.a[0].
 	 * If so, unpack tree and resume search.
 	 */
-	p = tree->a[0];
+	p = tree->u.a[0];
 	if (GET_PTR_TYPE(p) != PTR_TYPE_SUBTREE)
 		return NULL;
 	l = (struct leaf_node *) CLR_PTR_TYPE(p);
 	if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
 		/* unpack tree and resume search */
-		tree->a[0] = NULL;
+		tree->u.a[0] = NULL;
 		load_subtree(l->val_sha1, l->key_sha1, l->key_sha1[19], tree,
 			     NULL, (int) n);
 		free(l);
@@ -250,12 +250,12 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
 	const struct leaf_node *l;
 	int ret;
 	unsigned char i = GET_NIBBLE(n, entry->key_sha1);
-	void *p = tree->a[i];
+	void *p = tree->u.a[i];
 	assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
 	switch(GET_PTR_TYPE(p)) {
 	case PTR_TYPE_NULL:
 		assert(!p);
-		tree->a[i] = SET_PTR_TYPE(entry, type);
+		tree->u.a[i] = SET_PTR_TYPE(entry, type);
 		return 0;
 	case PTR_TYPE_INTERNAL:
 		return note_tree_insert(CLR_PTR_TYPE(p), n + 1, entry, type);
@@ -273,7 +273,7 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
 			free(new_node);
 			return -1;
 		}
-		tree->a[i] = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
+		tree->u.a[i] = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
 		return note_tree_insert(new_node, n + 1, entry, type);
 	}
 }
@@ -281,20 +281,20 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
 /* Free the entire notes data contained in the given tree */
 static void note_tree_free(struct int_node *tree)
 {
-	if (tree->magic == (void *) ~0) {
-		if (tree->prev) {
-			note_tree_free(tree->prev);
-			free(tree->prev);
+	if (tree->u.s.magic == (void *) ~0) {
+		if (tree->u.s.prev) {
+			note_tree_free(tree->u.s.prev);
+			free(tree->u.s.prev);
 		}
-		if (tree->child) {
-			note_tree_free(tree->child);
-			free(tree->child);
+		if (tree->u.s.magic) {
+			note_tree_free(tree->u.s.magic);
+			free(tree->u.s.magic);
 		}
 	}
 	else {
 		unsigned int i;
 		for (i = 0; i < 16; i++) {
-			void *p = tree->a[i];
+			void *p = tree->u.a[i];
 			switch(GET_PTR_TYPE(p)) {
 			case PTR_TYPE_INTERNAL:
 				note_tree_free(CLR_PTR_TYPE(p));
@@ -439,12 +439,12 @@ static void load_date_subtree(struct tree_desc *tree_desc,
 		else  /* this is the last entry, store directly into node */
 			new_node = node;
 
-		new_node->magic = (void *) ~0;
-		new_node->child = NULL;
-		new_node->prev = cur_node;
-		new_node->parent = parent;
-		hashcpy(new_node->tree_sha1, entry.sha1);
-		strcpy(new_node->period, period);
+		new_node->u.s.magic = (void *) ~0;
+		new_node->u.s.magic = NULL;
+		new_node->u.s.prev = cur_node;
+		new_node->u.s.parent = parent;
+		hashcpy(new_node->u.s.tree_sha1, entry.sha1);
+		strcpy(new_node->u.s.period, period);
 		cur_node = new_node;
 	}
 	assert(!cur_node || cur_node == node);
@@ -552,38 +552,38 @@ static unsigned char *lookup_notes(const struct commit *commit)
 	/* Convert commit->date to YYYY-MM-DD format */
 	short_date = show_date(commit->date, 0, DATE_SHORT);
 
-	while (node->magic == (void *) ~0) {  /* date-based node */
-		int cmp = SUBTREE_DATE_PREFIXCMP(short_date, node->period);
+	while (node->u.s.magic == (void *) ~0) {  /* date-based node */
+		int cmp = SUBTREE_DATE_PREFIXCMP(short_date, node->u.s.period);
 		if (cmp == 0) {
 			/* Search inside child node */
-			if (!node->child) {
+			if (!node->u.s.magic) {
 				/* Must unpack child node first */
-				node->child = (struct int_node *)
+				node->u.s.magic = (struct int_node *)
 					xcalloc(sizeof(struct int_node), 1);
-				load_subtree(node->tree_sha1,
-					(const unsigned char *) node->period,
-					strlen(node->period), node->child,
+				load_subtree(node->u.s.tree_sha1,
+					(const unsigned char *) node->u.s.period,
+					strlen(node->u.s.period), node->u.s.magic,
 					node, -1);
 			}
 			seen_node = node;
-			node = node->child;
+			node = node->u.s.magic;
 		}
 		else if (cmp > 0) {
 			/* Search in past node */
-			if (node->prev)
-				node = node->prev;
+			if (node->u.s.prev)
+				node = node->u.s.prev;
 			else
-				node = node->parent;
+				node = node->u.s.parent;
 		}
 		else {
 			/* Search in future node */
-			if (!node->parent) {
+			if (!node->u.s.parent) {
 				/* Restart from root_node */
 				seen_node = node;
 				node = &root_node;
 			}
 			else
-				node = node->parent;
+				node = node->u.s.parent;
 		}
 		if (!node || node == seen_node) {
 			/* We've been here before, give up search */
@@ -591,15 +591,15 @@ static unsigned char *lookup_notes(const struct commit *commit)
 		}
 	}
 	while (cur_node &&
-	       SUBTREE_DATE_PREFIXCMP(cur_node->period, seen_node->period) < 0)
+	       SUBTREE_DATE_PREFIXCMP(cur_node->u.s.period, seen_node->u.s.period) < 0)
 	{
 		/*
 		 * We're about to move cur_node backwards in history. We are
 		 * unlikely to need this cur_node in the future, so free() it.
 		 */
-		note_tree_free(cur_node->child);
-		cur_node->child = NULL;
-		cur_node = cur_node->parent;
+		note_tree_free(cur_node->u.s.magic);
+		cur_node->u.s.magic = NULL;
+		cur_node = cur_node->u.s.parent;
 	}
 	cur_node = seen_node;
 
-- 
1.6.5.rc0.82.g1c5d9

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCHv6 10/14] Teach notes code to free its internal data structures on request.
  2009-09-12 18:40       ` Junio C Hamano
@ 2009-09-12 22:21         ` Johan Herland
  0 siblings, 0 replies; 58+ messages in thread
From: Johan Herland @ 2009-09-12 22:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool, spearce

On Saturday 12 September 2009, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > There's no need to be rude to memory-concious callers...
> 
> Will squash this in.
> 
> -- >8 --
> From: Junio C Hamano <gitster@pobox.com>
> Date: Sat, 12 Sep 2009 11:34:24 -0700
> Subject: [PATCH] notes.[ch] fixup: avoid old-style declaration
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  notes.c |    2 +-
>  notes.h |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/notes.c b/notes.c
> index 008c3d4..9ed2c87 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -161,7 +161,7 @@ void get_commit_notes(const struct commit *commit,
>  struct strbuf *sb, free(msg);
>  }
> 
> -void free_commit_notes()
> +void free_commit_notes(void)
>  {
>  	free(hash_map.entries);
>  	memset(&hash_map, 0, sizeof(struct hash_map));
> diff --git a/notes.h b/notes.h
> index 41802e5..d1dd1d1 100644
> --- a/notes.h
> +++ b/notes.h
> @@ -7,6 +7,6 @@
>  void get_commit_notes(const struct commit *commit, struct strbuf *sb,
>  		const char *output_encoding, int flags);
> 
> -void free_commit_notes();
> +void free_commit_notes(void);
> 
>  #endif

Thanks,

Acked-by: Johan Herland <johan@herland.net>


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1
  2009-09-12 18:41       ` Junio C Hamano
@ 2009-09-12 22:33         ` Johan Herland
  2009-09-12 23:37           ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: Johan Herland @ 2009-09-12 22:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool, spearce

On Saturday 12 September 2009, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > This is a major expansion of the notes lookup code to allow for
> > variations in the notes tree organization. The variations allowed
> > include mixing fanout schemes based on the commit dates of the
> > annotated commits (aka. date-based fanout) with fanout schemes based on
> > the SHA1 of the annotated commits (aka. SHA1-based fanout).

Note that this patch is about to be removed from this series, cf. today's 
discussion with Shawn elsewhere in this thread.

> Will squash this in.

I agree with the attempt, but not all of the hunks are good:

> @@ -281,20 +281,20 @@ static int note_tree_insert(struct int_node *tree,
>  unsigned char n, /* Free the entire notes data contained in the given
>  tree */
>  static void note_tree_free(struct int_node *tree)
>  {
> -	if (tree->magic == (void *) ~0) {
> -		if (tree->prev) {
> -			note_tree_free(tree->prev);
> -			free(tree->prev);
> +	if (tree->u.s.magic == (void *) ~0) {
> +		if (tree->u.s.prev) {
> +			note_tree_free(tree->u.s.prev);
> +			free(tree->u.s.prev);
>  		}
> -		if (tree->child) {
> -			note_tree_free(tree->child);
> -			free(tree->child);
> +		if (tree->u.s.magic) {
> +			note_tree_free(tree->u.s.magic);
> +			free(tree->u.s.magic);

Here, you are replacing tree->child with tree->u.s.magic. Shouldn't that be 
tree->u.s.child instead?

> @@ -439,12 +439,12 @@ static void load_date_subtree(struct tree_desc
>  *tree_desc, else  /* this is the last entry, store directly into node */
>  			new_node = node;
> 
> -		new_node->magic = (void *) ~0;
> -		new_node->child = NULL;
> -		new_node->prev = cur_node;
> -		new_node->parent = parent;
> -		hashcpy(new_node->tree_sha1, entry.sha1);
> -		strcpy(new_node->period, period);
> +		new_node->u.s.magic = (void *) ~0;
> +		new_node->u.s.magic = NULL;

Same as above: new_node->u.s.child

> +		new_node->u.s.prev = cur_node;
> +		new_node->u.s.parent = parent;
> +		hashcpy(new_node->u.s.tree_sha1, entry.sha1);
> +		strcpy(new_node->u.s.period, period);
>  		cur_node = new_node;
>  	}
>  	assert(!cur_node || cur_node == node);
> @@ -552,38 +552,38 @@ static unsigned char *lookup_notes(const struct
>  commit *commit) /* Convert commit->date to YYYY-MM-DD format */
>  	short_date = show_date(commit->date, 0, DATE_SHORT);
> 
> -	while (node->magic == (void *) ~0) {  /* date-based node */
> -		int cmp = SUBTREE_DATE_PREFIXCMP(short_date, node->period);
> +	while (node->u.s.magic == (void *) ~0) {  /* date-based node */
> +		int cmp = SUBTREE_DATE_PREFIXCMP(short_date, node->u.s.period);
>  		if (cmp == 0) {
>  			/* Search inside child node */
> -			if (!node->child) {
> +			if (!node->u.s.magic) {
>  				/* Must unpack child node first */
> -				node->child = (struct int_node *)
> +				node->u.s.magic = (struct int_node *)
>  					xcalloc(sizeof(struct int_node), 1);
> -				load_subtree(node->tree_sha1,
> -					(const unsigned char *) node->period,
> -					strlen(node->period), node->child,
> +				load_subtree(node->u.s.tree_sha1,
> +					(const unsigned char *) node->u.s.period,
> +					strlen(node->u.s.period), node->u.s.magic,
>  					node, -1);
>  			}
>  			seen_node = node;
> -			node = node->child;
> +			node = node->u.s.magic;

Same again, 4 times in the above hunk.

> @@ -591,15 +591,15 @@ static unsigned char *lookup_notes(const struct
>  commit *commit) }
>  	}
>  	while (cur_node &&
> -	       SUBTREE_DATE_PREFIXCMP(cur_node->period, seen_node->period) < 0)
> +	       SUBTREE_DATE_PREFIXCMP(cur_node->u.s.period,
>  seen_node->u.s.period) < 0) {
>  		/*
>  		 * We're about to move cur_node backwards in history. We are
>  		 * unlikely to need this cur_node in the future, so free() it.
>  		 */
> -		note_tree_free(cur_node->child);
> -		cur_node->child = NULL;
> -		cur_node = cur_node->parent;
> +		note_tree_free(cur_node->u.s.magic);
> +		cur_node->u.s.magic = NULL;

...and another one here.

> +		cur_node = cur_node->u.s.parent;
>  	}
>  	cur_node = seen_node;
> 


But as I said above, you may want to drop 13/14 and 14/14 completely, 
instead.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1
  2009-09-12 22:33         ` Johan Herland
@ 2009-09-12 23:37           ` Junio C Hamano
  0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2009-09-12 23:37 UTC (permalink / raw)
  To: Johan Herland
  Cc: Junio C Hamano, git, Johannes.Schindelin, trast, tavestbo, git,
	chriscool, spearce

Johan Herland <johan@herland.net> writes:

> But as I said above, you may want to drop 13/14 and 14/14 completely, 
> instead.

Thanks.  Will do.

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2009-09-12 23:38 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 03/14] Speed up git notes lookup Johan Herland
2009-09-08  2:26 ` [PATCHv5 04/14] Add an expensive test for git-notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-08  2:26 ` [PATCHv5 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-08  2:26 ` [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-08  2:26 ` [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-08  2:26 ` [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-08  2:27 ` [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-08  2:27 ` [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-08  2:27 ` [PATCHv5 14/14] Add test cases for date-based fanouts Johan Herland
2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
2009-09-08  4:16   ` Junio C Hamano
2009-09-08  8:54     ` Johan Herland
2009-09-08  9:32       ` Johannes Schindelin
2009-09-08 12:36         ` Johan Herland
2009-09-08 15:53           ` Johannes Schindelin
2009-09-08 22:46             ` Johan Herland
2009-09-10  6:23               ` Stephen R. van den Berg
2009-09-10  9:25           ` Johan Herland
2009-09-08 20:31         ` Junio C Hamano
2009-09-08 21:10           ` Shawn O. Pearce
2009-09-08 21:36             ` Sverre Rabbelier
2009-09-08 21:39               ` Shawn O. Pearce
2009-09-08 21:57                 ` Sverre Rabbelier
2009-09-08 21:40           ` Johan Herland
2009-09-12 15:50   ` Johan Herland
2009-09-12 18:11     ` Shawn O. Pearce
2009-09-12 18:35       ` Johan Herland
2009-09-10 14:00 ` Geert Bosch
2009-09-10 14:09   ` Michael J Gruber
2009-09-10 14:12     ` Geert Bosch
2009-09-12  0:11 ` Junio C Hamano
2009-09-12 15:52   ` Johan Herland
2009-09-12 16:08     ` [PATCHv6 " Johan Herland
2009-09-12 16:08     ` [PATCHv6 01/14] Introduce commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 03/14] Speed up git notes lookup Johan Herland
2009-09-12 16:08     ` [PATCHv6 04/14] Add an expensive test for git-notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-12 16:08     ` [PATCHv6 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-12 16:08     ` [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-12 16:08     ` [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-12 18:40       ` Junio C Hamano
2009-09-12 22:21         ` Johan Herland
2009-09-12 16:08     ` [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-12 16:08     ` [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-12 16:08     ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-12 18:41       ` Junio C Hamano
2009-09-12 22:33         ` Johan Herland
2009-09-12 23:37           ` Junio C Hamano
2009-09-12 16:08     ` [PATCHv6 14/14] Add test cases for various date-based fanouts Johan Herland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.