All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/24] Index-v5
@ 2013-11-27 12:00 Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
                   ` (24 more replies)
  0 siblings, 25 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Hi,

previous rounds (without api) are at $gmane/202752, $gmane/202923,
$gmane/203088 and $gmane/203517, the previous rounds with api were at
$gmane/229732, $gmane/230210 and $gmane/232488.  Thanks to Duy for
reviewing the the last round and Junio, Ramsay and Eric for additional
comments.

Since the last round I've added a POC for partial writing, resulting
in the following performance improvements for update-index:

Test                                        1063432           HEAD
------------------------------------------------------------------------------------
0003.2: v[23]: update-index                 0.60(0.38+0.20)   0.76(0.36+0.17) +26.7%
0003.3: v[23]: grep nonexistent -- subdir   0.28(0.17+0.11)   0.28(0.18+0.09) +0.0%
0003.4: v[23]: ls-files -- subdir           0.26(0.15+0.10)   0.24(0.14+0.09) -7.7%
0003.7: v[23] update-index                  0.59(0.36+0.22)   0.58(0.36+0.20) -1.7%
0003.9: v4: update-index                    0.46(0.28+0.17)   0.45(0.30+0.11) -2.2%
0003.10: v4: grep nonexistent -- subdir     0.26(0.14+0.11)   0.21(0.14+0.07) -19.2%
0003.11: v4: ls-files -- subdir             0.24(0.14+0.10)   0.20(0.12+0.08) -16.7%
0003.14: v4 update-index                    0.49(0.31+0.18)   0.65(0.34+0.17) +32.7%
0003.16: v5: update-index                   0.53(0.30+0.22)   0.50(0.28+0.20) -5.7%
0003.17: v5: ls-files                       0.27(0.15+0.12)   0.27(0.17+0.10) +0.0%
0003.18: v5: grep nonexistent -- subdir     0.02(0.01+0.01)   0.03(0.01+0.01) +50.0%
0003.19: v5: ls-files -- subdir             0.02(0.00+0.02)   0.02(0.01+0.01) +0.0%
0003.22: v5 update-index                    0.53(0.29+0.23)   0.02(0.01+0.01) -96.2%

Given this, I don't think a complete change of the in-core format for
the cache-entries is necessary to take full advantage of the new index
file format.  Instead some changes to the current in-core format would
work well with the new on-disk format.

The current in-memory format fits the internal needs of git fairly well,
so I don't think changing it to fit a better index file format would
make a lot of sense, given that we can take advantage of the new format
with the existing in-memory format.

This series doesn't use kb/fast-hashmap yet, but that should be fairly
simple to change if the series is deemed a good change.  The
performance tests for update-index test require
tg/perf-lib-test-perf-cleanup. 

Other changes, made following the review comments are:

documentation: add documentation of the index-v5 file format
  - Update documentation that directory flags are now 32-bits.  That
    makes aligned access simpler
  - offset_to_offset is no longer included in the checksum for files.
    It's unnecessary.

read-cache: read index-v5
  - Add fix for reading with different level pathspecs given
  - Use init_directory_entry to initialize all fields in a new
    directory entry
  - use memset to simplify the create_new_conflict function
  - Add comments to explain -5 when reading directories and files
  - Add comments for the more complex functions
  - Add name flex_array to the end of ondisk_directory_entry for
    simplified reading
  - Add name flex_array to the end of ondisk_cache_entry for
    simplified reading
  - Move conflict reading functions to next patch
  - mark functions as static when they are

read-cache: read resolve-undo data
  - Add comments for the more complex function
  - Read conflicts + resolve undo data as extension

read-cache: read cache-tree in index-v5
  - Add comments for the more complex function
  - Instead of sorting the directory entries, sort the cache-tree
    directly.  This also required changing the algorithms with which
    the cache entries are extracted from the directory tree.

read-cache: write index-v5
  - Free pointers allocated by super_directory
  - Rewrite condition as suggested by Duy
  - Don't check for CE_REMOVE'd entries in the writing code, they are
    already checked in the compile_directory_data code
  - Remove overly complicated directory size calculation since flags
    are now 32-bits

read-cache: write resolve-undo data for index-v5
  - Free pointers allocated by super_directory
  - Write conflicts + resolve undo data as extension

introduce GIT_INDEX_VERSION environment variable
  - Add documentation for GIT_INDEX_VERSION

test-lib: allow setting the index format version

Removed commits:
  - read-cache: don't check uid, gid, ino
  - read-cache: use fixed width integer types (independently in pu)
  - read-cache: clear version in discard_index()

Typos fixed as suggested by Eric Sunshine

Thomas Gummerer (22):
  read-cache: split index file version specific functionality
  read-cache: move index v2 specific functions to their own file
  read-cache: Re-read index if index file changed
  add documentation for the index api
  read-cache: add index reading api
  make sure partially read index is not changed
  grep.c: use index api
  ls-files.c: use index api
  documentation: add documentation of the index-v5 file format
  read-cache: make in-memory format aware of stat_crc
  read-cache: read index-v5
  read-cache: read resolve-undo data
  read-cache: read cache-tree in index-v5
  read-cache: write index-v5
  read-cache: write index-v5 cache-tree data
  read-cache: write resolve-undo data for index-v5
  update-index.c: rewrite index when index-version is given
  introduce GIT_INDEX_VERSION environment variable
  test-lib: allow setting the index format version
  t1600: add index v5 specific tests
  POC for partial writing
  perf: add partial writing test

Thomas Rast (1):
  p0003-index.sh: add perf test for the index formats

 Documentation/git.txt                            |    5 +
 Documentation/technical/api-in-core-index.txt    |   56 +-
 Documentation/technical/index-file-format-v5.txt |  294 +++++
 Makefile                                         |   10 +
 builtin/apply.c                                  |    2 +
 builtin/grep.c                                   |   69 +-
 builtin/ls-files.c                               |   36 +-
 builtin/update-index.c                           |   50 +-
 cache-tree.c                                     |   15 +-
 cache-tree.h                                     |    2 +
 cache.h                                          |  115 +-
 lockfile.c                                       |    2 +-
 read-cache-v2.c                                  |  561 +++++++++
 read-cache-v5.c                                  | 1406 ++++++++++++++++++++++
 read-cache.c                                     |  691 +++--------
 read-cache.h                                     |   67 ++
 resolve-undo.c                                   |    1 +
 t/perf/p0003-index.sh                            |   74 ++
 t/t1600-index-v5.sh                              |   25 +
 t/t2101-update-index-reupdate.sh                 |   12 +-
 t/test-lib-functions.sh                          |    5 +
 t/test-lib.sh                                    |    3 +
 test-index-version.c                             |    6 +
 unpack-trees.c                                   |    3 +-
 24 files changed, 2921 insertions(+), 589 deletions(-)
 create mode 100644 Documentation/technical/index-file-format-v5.txt
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache-v5.c
 create mode 100644 read-cache.h
 create mode 100755 t/perf/p0003-index.sh
 create mode 100755 t/t1600-index-v5.sh

-- 
1.8.4.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 01/24] t2104: Don't fail for index versions other than [23]
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

t2104 currently checks for the exact index version 2 or 3,
depending if there is a skip-worktree flag or not. Other
index versions do not use extended flags and thus cannot
be tested for version changes.

Make this test update the index to version 2 at the beginning
of the test. Testing the skip-worktree flags for the default
index format is still covered by t7011 and t7012.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/t2104-update-index-skip-worktree.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t2104-update-index-skip-worktree.sh b/t/t2104-update-index-skip-worktree.sh
index 1d0879b..bd9644f 100755
--- a/t/t2104-update-index-skip-worktree.sh
+++ b/t/t2104-update-index-skip-worktree.sh
@@ -22,6 +22,7 @@ H sub/2
 EOF
 
 test_expect_success 'setup' '
+	git update-index --index-version=2 &&
 	mkdir sub &&
 	touch ./1 ./2 sub/1 sub/2 &&
 	git add 1 2 sub/1 sub/2 &&
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 02/24] read-cache: split index file version specific functionality
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Split index file version specific functionality to their own functions,
to prepare for moving the index file version specific parts to their own
file.  This makes it easier to add a new index file format later.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 114 ++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 74 insertions(+), 40 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 33dd676..5a8f405 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1269,10 +1269,8 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr_version(struct cache_header *hdr, unsigned long size)
 {
-	git_SHA_CTX c;
-	unsigned char sha1[20];
 	int hdr_version;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
@@ -1280,10 +1278,21 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+	return 0;
+}
+
+static int verify_hdr(void *mmap, unsigned long size)
+{
+	git_SHA_CTX c;
+	unsigned char sha1[20];
+
+	if (size < sizeof(struct cache_header) + 20)
+		die("index file smaller than expected");
+
 	git_SHA1_Init(&c);
-	git_SHA1_Update(&c, hdr, size - 20);
+	git_SHA1_Update(&c, mmap, size - 20);
 	git_SHA1_Final(sha1, &c);
-	if (hashcmp(sha1, (unsigned char *)hdr + size - 20))
+	if (hashcmp(sha1, (unsigned char *)mmap + size - 20))
 		return error("bad index file sha1 signature");
 	return 0;
 }
@@ -1425,44 +1434,14 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
-/* remember to discard_cache() before reading a different cache! */
-int read_index_from(struct index_state *istate, const char *path)
+static int read_index_v2(struct index_state *istate, void *mmap, unsigned long mmap_size)
 {
-	int fd, i;
-	struct stat st;
+	int i;
 	unsigned long src_offset;
 	struct cache_header *hdr;
-	void *mmap;
-	size_t mmap_size;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 
-	if (istate->initialized)
-		return istate->cache_nr;
-
-	istate->timestamp.sec = 0;
-	istate->timestamp.nsec = 0;
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		die_errno("index file open failed");
-	}
-
-	if (fstat(fd, &st))
-		die_errno("cannot stat the open index");
-
-	mmap_size = xsize_t(st.st_size);
-	if (mmap_size < sizeof(struct cache_header) + 20)
-		die("index file smaller than expected");
-
-	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
-	if (mmap == MAP_FAILED)
-		die_errno("unable to map index file");
-	close(fd);
-
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
-		goto unmap;
 
 	istate->version = ntohl(hdr->hdr_version);
 	istate->cache_nr = ntohl(hdr->hdr_entries);
@@ -1488,8 +1467,6 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += consumed;
 	}
 	strbuf_release(&previous_name_buf);
-	istate->timestamp.sec = st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
 	while (src_offset <= mmap_size - 20 - 8) {
 		/* After an array of active_nr index entries,
@@ -1509,6 +1486,58 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += 8;
 		src_offset += extsize;
 	}
+	return 0;
+unmap:
+	munmap(mmap, mmap_size);
+	die("index file corrupt");
+}
+
+/* remember to discard_cache() before reading a different cache! */
+int read_index_from(struct index_state *istate, const char *path)
+{
+	int fd;
+	struct stat st;
+	struct cache_header *hdr;
+	void *mmap;
+	size_t mmap_size;
+
+	errno = EBUSY;
+	if (istate->initialized)
+		return istate->cache_nr;
+
+	errno = ENOENT;
+	istate->timestamp.sec = 0;
+	istate->timestamp.nsec = 0;
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			return 0;
+		die_errno("index file open failed");
+	}
+
+	if (fstat(fd, &st))
+		die_errno("cannot stat the open index");
+
+	errno = EINVAL;
+	mmap_size = xsize_t(st.st_size);
+	if (mmap_size < sizeof(struct cache_header) + 20)
+		die("index file smaller than expected");
+
+	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (mmap == MAP_FAILED)
+		die_errno("unable to map index file");
+
+	hdr = mmap;
+	if (verify_hdr_version(hdr, mmap_size) < 0)
+		goto unmap;
+
+	if (verify_hdr(mmap, mmap_size) < 0)
+		goto unmap;
+
+	read_index_v2(istate, mmap, mmap_size);
+	istate->timestamp.sec = st.st_mtime;
+	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
@@ -1772,7 +1801,7 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-int write_index(struct index_state *istate, int newfd)
+static int write_index_v2(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
@@ -1864,6 +1893,11 @@ int write_index(struct index_state *istate, int newfd)
 	return 0;
 }
 
+int write_index(struct index_state *istate, int newfd)
+{
+	return write_index_v2(istate, newfd);
+}
+
 /*
  * Read the index file that is potentially unmerged into given
  * index_state, dropping any unmerged entries.  Returns true if
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Move index version 2 specific functions to their own file. The non-index
specific functions will be in read-cache.c, while the index version 2
specific functions will be in read-cache-v2.c.

Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Makefile               |   2 +
 builtin/apply.c        |   2 +
 builtin/update-index.c |   2 +-
 cache.h                |  13 +-
 read-cache-v2.c        | 553 ++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c           | 585 +++++--------------------------------------------
 read-cache.h           |  63 ++++++
 test-index-version.c   |   6 +
 unpack-trees.c         |   3 +-
 9 files changed, 683 insertions(+), 546 deletions(-)
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache.h

diff --git a/Makefile b/Makefile
index af847f8..5c28777 100644
--- a/Makefile
+++ b/Makefile
@@ -705,6 +705,7 @@ LIB_H += progress.h
 LIB_H += prompt.h
 LIB_H += quote.h
 LIB_H += reachable.h
+LIB_H += read-cache.h
 LIB_H += reflog-walk.h
 LIB_H += refs.h
 LIB_H += remote.h
@@ -849,6 +850,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
+LIB_OBJS += read-cache-v2.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += remote.o
diff --git a/builtin/apply.c b/builtin/apply.c
index ef32e4f..a954147 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3682,6 +3682,8 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
 			die ("Could not add %s to temporary index", name);
 	}
 
+	if (!result.initialized)
+		initialize_index(&result, 0);
 	fd = open(filename, O_WRONLY | O_CREAT, 0666);
 	if (fd < 0 || write_index(&result, fd) || close(fd))
 		die ("Could not write temporary index to %s", filename);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e3a10d7..c5bb889 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -863,7 +863,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 		if (the_index.version != preferred_index_format)
 			active_cache_changed = 1;
-		the_index.version = preferred_index_format;
+		change_cache_version(preferred_index_format);
 	}
 
 	if (read_from_stdin) {
diff --git a/cache.h b/cache.h
index ce377e1..290e26d 100644
--- a/cache.h
+++ b/cache.h
@@ -95,16 +95,8 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
  */
 #define DEFAULT_GIT_PORT 9418
 
-/*
- * Basic data structures for the directory cache
- */
 
 #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
-struct cache_header {
-	uint32_t hdr_signature;
-	uint32_t hdr_version;
-	uint32_t hdr_entries;
-};
 
 #define INDEX_FORMAT_LB 2
 #define INDEX_FORMAT_UB 4
@@ -279,6 +271,7 @@ struct index_state {
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
+	struct index_ops *ops;
 };
 
 extern struct index_state the_index;
@@ -296,6 +289,8 @@ extern void free_name_hash(struct index_state *istate);
 #define active_cache_changed (the_index.cache_changed)
 #define active_cache_tree (the_index.cache_tree)
 
+#define initialize_cache() initialize_index(&the_index, 0)
+#define change_cache_version(version) change_index_version(&the_index, (version))
 #define read_cache() read_index(&the_index)
 #define read_cache_from(path) read_index_from(&the_index, (path))
 #define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec))
@@ -455,6 +450,8 @@ extern void sanitize_stdfds(void);
 	} while (0)
 
 /* Initialize and use the cache information */
+extern void initialize_index(struct index_state *istate, int version);
+extern void change_index_version(struct index_state *istate, int version);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int read_index_from(struct index_state *, const char *path);
diff --git a/read-cache-v2.c b/read-cache-v2.c
new file mode 100644
index 0000000..a7d076c
--- /dev/null
+++ b/read-cache-v2.c
@@ -0,0 +1,553 @@
+#include "cache.h"
+#include "read-cache.h"
+#include "resolve-undo.h"
+#include "cache-tree.h"
+#include "varint.h"
+
+/* Mask for the name length in ce_flags in the on-disk index */
+#define CE_NAMEMASK  (0x0fff)
+
+/*****************************************************************
+ * Index File I/O
+ *****************************************************************/
+
+/*
+ * dev/ino/uid/gid/size are also just tracked to the low 32 bits
+ * Again - this is just a (very strong in practice) heuristic that
+ * the inode hasn't changed.
+ *
+ * We save the fields in big-endian order to allow using the
+ * index file over NFS transparently.
+ */
+struct ondisk_cache_entry {
+	struct cache_time ctime;
+	struct cache_time mtime;
+	uint32_t dev;
+	uint32_t ino;
+	uint32_t mode;
+	uint32_t uid;
+	uint32_t gid;
+	uint32_t size;
+	unsigned char sha1[20];
+	uint16_t flags;
+	char name[FLEX_ARRAY]; /* more */
+};
+
+/*
+ * This struct is used when CE_EXTENDED bit is 1
+ * The struct must match ondisk_cache_entry exactly from
+ * ctime till flags
+ */
+struct ondisk_cache_entry_extended {
+	struct cache_time ctime;
+	struct cache_time mtime;
+	uint32_t dev;
+	uint32_t ino;
+	uint32_t mode;
+	uint32_t uid;
+	uint32_t gid;
+	uint32_t size;
+	unsigned char sha1[20];
+	uint16_t flags;
+	uint16_t flags2;
+	char name[FLEX_ARRAY]; /* more */
+};
+
+/* These are only used for v3 or lower */
+#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) & ~7)
+#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)
+#define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len)
+#define ondisk_ce_size(ce) (((ce)->ce_flags & CE_EXTENDED) ? \
+			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
+			    ondisk_cache_entry_size(ce_namelen(ce)))
+
+static int verify_hdr(void *mmap, unsigned long size)
+{
+	git_SHA_CTX c;
+	unsigned char sha1[20];
+
+	if (size < + sizeof(struct cache_header) + 20)
+		die("index file smaller than expected");
+
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, mmap, size - 20);
+	git_SHA1_Final(sha1, &c);
+	if (hashcmp(sha1, (unsigned char *)mmap + size - 20))
+		return error("bad index file sha1 signature");
+	return 0;
+}
+
+static int match_stat_basic(const struct cache_entry *ce,
+			    struct stat *st, int changed)
+{
+	changed |= match_stat_data(&ce->ce_stat_data, st);
+
+	/* Racily smudged entry? */
+	if (!ce->ce_stat_data.sd_size) {
+		if (!is_empty_blob_sha1(ce->sha1))
+			changed |= DATA_CHANGED;
+	}
+	return changed;
+}
+
+static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
+						   unsigned int flags,
+						   const char *name,
+						   size_t len)
+{
+	struct cache_entry *ce = xmalloc(cache_entry_size(len));
+
+	ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);
+	ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);
+	ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);
+	ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
+	ce->ce_stat_data.sd_dev   = ntoh_l(ondisk->dev);
+	ce->ce_stat_data.sd_ino   = ntoh_l(ondisk->ino);
+	ce->ce_mode  = ntoh_l(ondisk->mode);
+	ce->ce_stat_data.sd_uid   = ntoh_l(ondisk->uid);
+	ce->ce_stat_data.sd_gid   = ntoh_l(ondisk->gid);
+	ce->ce_stat_data.sd_size  = ntoh_l(ondisk->size);
+	ce->ce_flags = flags & ~CE_NAMEMASK;
+	ce->ce_namelen = len;
+	hashcpy(ce->sha1, ondisk->sha1);
+	memcpy(ce->name, name, len);
+	ce->name[len] = '\0';
+	return ce;
+}
+
+/*
+ * Adjacent cache entries tend to share the leading paths, so it makes
+ * sense to only store the differences in later entries.  In the v4
+ * on-disk format of the index, each on-disk cache entry stores the
+ * number of bytes to be stripped from the end of the previous name,
+ * and the bytes to append to the result, to come up with its name.
+ */
+static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
+{
+	const unsigned char *ep, *cp = (const unsigned char *)cp_;
+	size_t len = decode_varint(&cp);
+
+	if (name->len < len)
+		die("malformed name field in the index");
+	strbuf_remove(name, name->len - len, len);
+	for (ep = cp; *ep; ep++)
+		; /* find the end */
+	strbuf_add(name, cp, ep - cp);
+	return (const char *)ep + 1 - cp_;
+}
+
+static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
+					    unsigned long *ent_size,
+					    struct strbuf *previous_name)
+{
+	struct cache_entry *ce;
+	size_t len;
+	const char *name;
+	unsigned int flags;
+
+	/* On-disk flags are just 16 bits */
+	flags = ntoh_s(ondisk->flags);
+	len = flags & CE_NAMEMASK;
+
+	if (flags & CE_EXTENDED) {
+		struct ondisk_cache_entry_extended *ondisk2;
+		int extended_flags;
+		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
+		extended_flags = ntoh_s(ondisk2->flags2) << 16;
+		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
+		if (extended_flags & ~CE_EXTENDED_FLAGS)
+			die("Unknown index entry format %08x", extended_flags);
+		flags |= extended_flags;
+		name = ondisk2->name;
+	}
+	else
+		name = ondisk->name;
+
+	if (!previous_name) {
+		/* v3 and earlier */
+		if (len == CE_NAMEMASK)
+			len = strlen(name);
+		ce = cache_entry_from_ondisk(ondisk, flags, name, len);
+
+		*ent_size = ondisk_ce_size(ce);
+	} else {
+		unsigned long consumed;
+		consumed = expand_name_field(previous_name, name);
+		ce = cache_entry_from_ondisk(ondisk, flags,
+					     previous_name->buf,
+					     previous_name->len);
+
+		*ent_size = (name - ((char *)ondisk)) + consumed;
+	}
+	return ce;
+}
+
+static int read_index_extension(struct index_state *istate,
+				const char *ext, void *data, unsigned long sz)
+{
+	switch (CACHE_EXT(ext)) {
+	case CACHE_EXT_TREE:
+		istate->cache_tree = cache_tree_read(data, sz);
+		break;
+	case CACHE_EXT_RESOLVE_UNDO:
+		istate->resolve_undo = resolve_undo_read(data, sz);
+		break;
+	default:
+		if (*ext < 'A' || 'Z' < *ext)
+			return error("index uses %.4s extension, which we do not understand",
+				     ext);
+		fprintf(stderr, "ignoring %.4s extension\n", ext);
+		break;
+	}
+	return 0;
+}
+
+static int read_index_v2(struct index_state *istate, void *mmap,
+			 unsigned long mmap_size)
+{
+	int i;
+	unsigned long src_offset;
+	struct cache_header *hdr;
+	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+
+	hdr = mmap;
+	istate->cache_nr = ntohl(hdr->hdr_entries);
+	istate->cache_alloc = alloc_nr(istate->cache_nr);
+	istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
+
+	if (istate->version == 4)
+		previous_name = &previous_name_buf;
+	else
+		previous_name = NULL;
+
+	src_offset = sizeof(*hdr);
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct ondisk_cache_entry *disk_ce;
+		struct cache_entry *ce;
+		unsigned long consumed;
+
+		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
+		ce = create_from_disk(disk_ce, &consumed, previous_name);
+		set_index_entry(istate, i, ce);
+
+		src_offset += consumed;
+	}
+	strbuf_release(&previous_name_buf);
+
+	while (src_offset <= mmap_size - 20 - 8) {
+		/* After an array of active_nr index entries,
+		 * there can be arbitrary number of extended
+		 * sections, each of which is prefixed with
+		 * extension name (4-byte) and section length
+		 * in 4-byte network byte order.
+		 */
+		uint32_t extsize;
+		memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
+		extsize = ntohl(extsize);
+		if (read_index_extension(istate,
+					(const char *) mmap + src_offset,
+					(char *) mmap + src_offset + 8,
+					extsize) < 0)
+			goto unmap;
+		src_offset += 8;
+		src_offset += extsize;
+	}
+	return 0;
+unmap:
+	munmap(mmap, mmap_size);
+	die("index file corrupt");
+}
+
+#define WRITE_BUFFER_SIZE 8192
+static unsigned char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned long write_buffer_len;
+
+static int ce_write_flush(git_SHA_CTX *context, int fd)
+{
+	unsigned int buffered = write_buffer_len;
+	if (buffered) {
+		git_SHA1_Update(context, write_buffer, buffered);
+		if (write_in_full(fd, write_buffer, buffered) != buffered)
+			return -1;
+		write_buffer_len = 0;
+	}
+	return 0;
+}
+
+static int ce_write(git_SHA_CTX *context, int fd, void *data, unsigned int len)
+{
+	while (len) {
+		unsigned int buffered = write_buffer_len;
+		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
+		if (partial > len)
+			partial = len;
+		memcpy(write_buffer + buffered, data, partial);
+		buffered += partial;
+		if (buffered == WRITE_BUFFER_SIZE) {
+			write_buffer_len = buffered;
+			if (ce_write_flush(context, fd))
+				return -1;
+			buffered = 0;
+		}
+		write_buffer_len = buffered;
+		len -= partial;
+		data = (char *) data + partial;
+	}
+	return 0;
+}
+
+static int write_index_ext_header(git_SHA_CTX *context, int fd,
+				  unsigned int ext, unsigned int sz)
+{
+	ext = htonl(ext);
+	sz = htonl(sz);
+	return ((ce_write(context, fd, &ext, 4) < 0) ||
+		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
+}
+
+static int ce_flush(git_SHA_CTX *context, int fd)
+{
+	unsigned int left = write_buffer_len;
+
+	if (left) {
+		write_buffer_len = 0;
+		git_SHA1_Update(context, write_buffer, left);
+	}
+
+	/* Flush first if not enough space for SHA1 signature */
+	if (left + 20 > WRITE_BUFFER_SIZE) {
+		if (write_in_full(fd, write_buffer, left) != left)
+			return -1;
+		left = 0;
+	}
+
+	/* Append the SHA1 signature at the end */
+	git_SHA1_Final(write_buffer + left, context);
+	left += 20;
+	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
+}
+
+static void ce_smudge_racily_clean_entry(struct index_state *istate, struct cache_entry *ce)
+{
+	/*
+	 * The only thing we care about in this function is to smudge the
+	 * falsely clean entry due to touch-update-touch race, so we leave
+	 * everything else as they are.  We are called for entries whose
+	 * ce_stat_data.sd_mtime match the index file mtime.
+	 *
+	 * Note that this actually does not do much for gitlinks, for
+	 * which ce_match_stat_basic() always goes to the actual
+	 * contents.  The caller checks with is_racy_timestamp() which
+	 * always says "no" for gitlinks, so we are not called for them ;-)
+	 */
+	struct stat st;
+
+	if (lstat(ce->name, &st) < 0)
+		return;
+	if (ce_match_stat_basic(istate, ce, &st))
+		return;
+	if (ce_modified_check_fs(ce, &st)) {
+		/* This is "racily clean"; smudge it.  Note that this
+		 * is a tricky code.  At first glance, it may appear
+		 * that it can break with this sequence:
+		 *
+		 * $ echo xyzzy >frotz
+		 * $ git-update-index --add frotz
+		 * $ : >frotz
+		 * $ sleep 3
+		 * $ echo filfre >nitfol
+		 * $ git-update-index --add nitfol
+		 *
+		 * but it does not.  When the second update-index runs,
+		 * it notices that the entry "frotz" has the same timestamp
+		 * as index, and if we were to smudge it by resetting its
+		 * size to zero here, then the object name recorded
+		 * in index is the 6-byte file but the cached stat information
+		 * becomes zero --- which would then match what we would
+		 * obtain from the filesystem next time we stat("frotz").
+		 *
+		 * However, the second update-index, before calling
+		 * this function, notices that the cached size is 6
+		 * bytes and what is on the filesystem is an empty
+		 * file, and never calls us, so the cached size information
+		 * for "frotz" stays 6 which does not match the filesystem.
+		 */
+		ce->ce_stat_data.sd_size = 0;
+	}
+}
+
+/* Copy miscellaneous fields but not the name */
+static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
+				       struct cache_entry *ce)
+{
+	short flags;
+
+	ondisk->ctime.sec = htonl(ce->ce_stat_data.sd_ctime.sec);
+	ondisk->mtime.sec = htonl(ce->ce_stat_data.sd_mtime.sec);
+	ondisk->ctime.nsec = htonl(ce->ce_stat_data.sd_ctime.nsec);
+	ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);
+	ondisk->dev  = htonl(ce->ce_stat_data.sd_dev);
+	ondisk->ino  = htonl(ce->ce_stat_data.sd_ino);
+	ondisk->mode = htonl(ce->ce_mode);
+	ondisk->uid  = htonl(ce->ce_stat_data.sd_uid);
+	ondisk->gid  = htonl(ce->ce_stat_data.sd_gid);
+	ondisk->size = htonl(ce->ce_stat_data.sd_size);
+	hashcpy(ondisk->sha1, ce->sha1);
+
+	flags = ce->ce_flags;
+	flags |= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));
+	ondisk->flags = htons(flags);
+	if (ce->ce_flags & CE_EXTENDED) {
+		struct ondisk_cache_entry_extended *ondisk2;
+		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
+		ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);
+		return ondisk2->name;
+	}
+	else {
+		return ondisk->name;
+	}
+}
+
+static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce,
+			  struct strbuf *previous_name)
+{
+	int size;
+	struct ondisk_cache_entry *ondisk;
+	char *name;
+	int result;
+
+	if (!previous_name) {
+		size = ondisk_ce_size(ce);
+		ondisk = xcalloc(1, size);
+		name = copy_cache_entry_to_ondisk(ondisk, ce);
+		memcpy(name, ce->name, ce_namelen(ce));
+	} else {
+		int common, to_remove, prefix_size;
+		unsigned char to_remove_vi[16];
+		for (common = 0;
+		     (ce->name[common] &&
+		      common < previous_name->len &&
+		      ce->name[common] == previous_name->buf[common]);
+		     common++)
+			; /* still matching */
+		to_remove = previous_name->len - common;
+		prefix_size = encode_varint(to_remove, to_remove_vi);
+
+		if (ce->ce_flags & CE_EXTENDED)
+			size = offsetof(struct ondisk_cache_entry_extended, name);
+		else
+			size = offsetof(struct ondisk_cache_entry, name);
+		size += prefix_size + (ce_namelen(ce) - common + 1);
+
+		ondisk = xcalloc(1, size);
+		name = copy_cache_entry_to_ondisk(ondisk, ce);
+		memcpy(name, to_remove_vi, prefix_size);
+		memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common);
+
+		strbuf_splice(previous_name, common, to_remove,
+			      ce->name + common, ce_namelen(ce) - common);
+	}
+
+	result = ce_write(c, fd, ondisk, size);
+	free(ondisk);
+	return result;
+}
+
+static int write_index_v2(struct index_state *istate, int newfd)
+{
+	git_SHA_CTX c;
+	struct cache_header hdr;
+	int i, err, removed, extended, hdr_version;
+	struct cache_entry **cache = istate->cache;
+	int entries = istate->cache_nr;
+	struct stat st;
+	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+
+	for (i = removed = extended = 0; i < entries; i++) {
+		if (cache[i]->ce_flags & CE_REMOVE)
+			removed++;
+
+		/* reduce extended entries if possible */
+		cache[i]->ce_flags &= ~CE_EXTENDED;
+		if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
+			extended++;
+			cache[i]->ce_flags |= CE_EXTENDED;
+		}
+	}
+
+	if (!istate->version)
+		istate->version = INDEX_FORMAT_DEFAULT;
+
+	/* demote version 3 to version 2 when the latter suffices */
+	if (istate->version == 3 || istate->version == 2)
+		istate->version = extended ? 3 : 2;
+
+	hdr_version = istate->version;
+
+	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
+	hdr.hdr_version = htonl(hdr_version);
+	hdr.hdr_entries = htonl(entries - removed);
+
+	git_SHA1_Init(&c);
+	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
+		return -1;
+
+	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
+	for (i = 0; i < entries; i++) {
+		struct cache_entry *ce = cache[i];
+		if (ce->ce_flags & CE_REMOVE)
+			continue;
+		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
+			ce_smudge_racily_clean_entry(istate, ce);
+		if (is_null_sha1(ce->sha1)) {
+			static const char msg[] = "cache entry has null sha1: %s";
+			static int allow = -1;
+
+			if (allow < 0)
+				allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);
+			if (allow)
+				warning(msg, ce->name);
+			else
+				return error(msg, ce->name);
+		}
+		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
+			return -1;
+	}
+	strbuf_release(&previous_name_buf);
+
+	/* Write extension data here */
+	if (istate->cache_tree) {
+		struct strbuf sb = STRBUF_INIT;
+
+		cache_tree_write(&sb, istate->cache_tree);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
+	if (istate->resolve_undo) {
+		struct strbuf sb = STRBUF_INIT;
+
+		resolve_undo_write(&sb, istate->resolve_undo);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
+					     sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
+
+	if (ce_flush(&c, newfd) || fstat(newfd, &st))
+		return -1;
+	istate->timestamp.sec = (unsigned int)st.st_mtime;
+	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+	return 0;
+}
+
+struct index_ops v2_ops = {
+	match_stat_basic,
+	verify_hdr,
+	read_index_v2,
+	write_index_v2
+};
diff --git a/read-cache.c b/read-cache.c
index 5a8f405..e081084 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -5,6 +5,7 @@
  */
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "read-cache.h"
 #include "cache-tree.h"
 #include "refs.h"
 #include "dir.h"
@@ -17,26 +18,9 @@
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
-/* Mask for the name length in ce_flags in the on-disk index */
-
-#define CE_NAMEMASK  (0x0fff)
-
-/* Index extensions.
- *
- * The first letter should be 'A'..'Z' for extensions that are not
- * necessary for a correct operation (i.e. optimization data).
- * When new extensions are added that _needs_ to be understood in
- * order to correctly interpret the index file, pick character that
- * is outside the range, to cause the reader to abort.
- */
-
-#define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
-#define CACHE_EXT_TREE 0x54524545	/* "TREE" */
-#define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
-
 struct index_state the_index;
 
-static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
+void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
@@ -190,7 +174,7 @@ static int ce_compare_gitlink(const struct cache_entry *ce)
 	return hashcmp(sha1, ce->sha1);
 }
 
-static int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
+int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
 {
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
@@ -210,7 +194,18 @@ static int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
 	return 0;
 }
 
-static int ce_match_stat_basic(const struct cache_entry *ce, struct stat *st)
+/*
+ * Check if the reading/writing operations are set and set them
+ * to the correct version
+ */
+static void set_istate_ops(struct index_state *istate)
+{
+	if (istate->version >= 2 && istate->version <= 4)
+		istate->ops = &v2_ops;
+}
+
+int ce_match_stat_basic(const struct index_state *istate,
+			const struct cache_entry *ce, struct stat *st)
 {
 	unsigned int changed = 0;
 
@@ -243,19 +238,13 @@ static int ce_match_stat_basic(const struct cache_entry *ce, struct stat *st)
 		die("internal error: ce_mode is %o", ce->ce_mode);
 	}
 
-	changed |= match_stat_data(&ce->ce_stat_data, st);
-
-	/* Racily smudged entry? */
-	if (!ce->ce_stat_data.sd_size) {
-		if (!is_empty_blob_sha1(ce->sha1))
-			changed |= DATA_CHANGED;
-	}
-
+	changed = istate->ops->match_stat_basic(ce, st, changed);
 	return changed;
 }
 
-static int is_racy_timestamp(const struct index_state *istate,
-			     const struct cache_entry *ce)
+
+int is_racy_timestamp(const struct index_state *istate,
+		      const struct cache_entry *ce)
 {
 	return (!S_ISGITLINK(ce->ce_mode) &&
 		istate->timestamp.sec &&
@@ -298,7 +287,7 @@ int ie_match_stat(const struct index_state *istate,
 	if (ce->ce_flags & CE_INTENT_TO_ADD)
 		return DATA_CHANGED | TYPE_CHANGED | MODE_CHANGED;
 
-	changed = ce_match_stat_basic(ce, st);
+	changed = ce_match_stat_basic(istate, ce, st);
 
 	/*
 	 * Within 1 second of this sequence:
@@ -982,6 +971,8 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 {
 	int pos;
 
+	if (!istate->initialized)
+		initialize_index(istate, INDEX_FORMAT_DEFAULT);
 	if (option & ADD_CACHE_JUST_APPEND)
 		pos = istate->cache_nr;
 	else {
@@ -1212,13 +1203,25 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 	return refresh_cache_ent(&the_index, ce, really, NULL, NULL);
 }
 
+void initialize_index(struct index_state *istate, int version)
+{
+	istate->initialized = 1;
+	if (!version)
+		version = INDEX_FORMAT_DEFAULT;
+	istate->version = version;
+	set_istate_ops(istate);
+}
+
+void change_index_version(struct index_state *istate, int version)
+{
+	istate->version = version;
+	set_istate_ops(istate);
+}
 
 /*****************************************************************
  * Index File I/O
  *****************************************************************/
 
-#define INDEX_FORMAT_DEFAULT 3
-
 /*
  * dev/ino/uid/gid/size are also just tracked to the low 32 bits
  * Again - this is just a (very strong in practice) heuristic that
@@ -1269,7 +1272,8 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr_version(struct cache_header *hdr, unsigned long size)
+static int verify_hdr_version(struct index_state *istate,
+			      struct cache_header *hdr, unsigned long size)
 {
 	int hdr_version;
 
@@ -1278,42 +1282,7 @@ static int verify_hdr_version(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
-	return 0;
-}
-
-static int verify_hdr(void *mmap, unsigned long size)
-{
-	git_SHA_CTX c;
-	unsigned char sha1[20];
-
-	if (size < sizeof(struct cache_header) + 20)
-		die("index file smaller than expected");
-
-	git_SHA1_Init(&c);
-	git_SHA1_Update(&c, mmap, size - 20);
-	git_SHA1_Final(sha1, &c);
-	if (hashcmp(sha1, (unsigned char *)mmap + size - 20))
-		return error("bad index file sha1 signature");
-	return 0;
-}
-
-static int read_index_extension(struct index_state *istate,
-				const char *ext, void *data, unsigned long sz)
-{
-	switch (CACHE_EXT(ext)) {
-	case CACHE_EXT_TREE:
-		istate->cache_tree = cache_tree_read(data, sz);
-		break;
-	case CACHE_EXT_RESOLVE_UNDO:
-		istate->resolve_undo = resolve_undo_read(data, sz);
-		break;
-	default:
-		if (*ext < 'A' || 'Z' < *ext)
-			return error("index uses %.4s extension, which we do not understand",
-				     ext);
-		fprintf(stderr, "ignoring %.4s extension\n", ext);
-		break;
-	}
+	initialize_index(istate, hdr_version);
 	return 0;
 }
 
@@ -1322,176 +1291,6 @@ int read_index(struct index_state *istate)
 	return read_index_from(istate, get_index_file());
 }
 
-#ifndef NEEDS_ALIGNED_ACCESS
-#define ntoh_s(var) ntohs(var)
-#define ntoh_l(var) ntohl(var)
-#else
-static inline uint16_t ntoh_s_force_align(void *p)
-{
-	uint16_t x;
-	memcpy(&x, p, sizeof(x));
-	return ntohs(x);
-}
-static inline uint32_t ntoh_l_force_align(void *p)
-{
-	uint32_t x;
-	memcpy(&x, p, sizeof(x));
-	return ntohl(x);
-}
-#define ntoh_s(var) ntoh_s_force_align(&(var))
-#define ntoh_l(var) ntoh_l_force_align(&(var))
-#endif
-
-static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
-						   unsigned int flags,
-						   const char *name,
-						   size_t len)
-{
-	struct cache_entry *ce = xmalloc(cache_entry_size(len));
-
-	ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);
-	ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);
-	ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);
-	ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
-	ce->ce_stat_data.sd_dev   = ntoh_l(ondisk->dev);
-	ce->ce_stat_data.sd_ino   = ntoh_l(ondisk->ino);
-	ce->ce_mode  = ntoh_l(ondisk->mode);
-	ce->ce_stat_data.sd_uid   = ntoh_l(ondisk->uid);
-	ce->ce_stat_data.sd_gid   = ntoh_l(ondisk->gid);
-	ce->ce_stat_data.sd_size  = ntoh_l(ondisk->size);
-	ce->ce_flags = flags & ~CE_NAMEMASK;
-	ce->ce_namelen = len;
-	hashcpy(ce->sha1, ondisk->sha1);
-	memcpy(ce->name, name, len);
-	ce->name[len] = '\0';
-	return ce;
-}
-
-/*
- * Adjacent cache entries tend to share the leading paths, so it makes
- * sense to only store the differences in later entries.  In the v4
- * on-disk format of the index, each on-disk cache entry stores the
- * number of bytes to be stripped from the end of the previous name,
- * and the bytes to append to the result, to come up with its name.
- */
-static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
-{
-	const unsigned char *ep, *cp = (const unsigned char *)cp_;
-	size_t len = decode_varint(&cp);
-
-	if (name->len < len)
-		die("malformed name field in the index");
-	strbuf_remove(name, name->len - len, len);
-	for (ep = cp; *ep; ep++)
-		; /* find the end */
-	strbuf_add(name, cp, ep - cp);
-	return (const char *)ep + 1 - cp_;
-}
-
-static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
-					    unsigned long *ent_size,
-					    struct strbuf *previous_name)
-{
-	struct cache_entry *ce;
-	size_t len;
-	const char *name;
-	unsigned int flags;
-
-	/* On-disk flags are just 16 bits */
-	flags = ntoh_s(ondisk->flags);
-	len = flags & CE_NAMEMASK;
-
-	if (flags & CE_EXTENDED) {
-		struct ondisk_cache_entry_extended *ondisk2;
-		int extended_flags;
-		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
-		extended_flags = ntoh_s(ondisk2->flags2) << 16;
-		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
-		if (extended_flags & ~CE_EXTENDED_FLAGS)
-			die("Unknown index entry format %08x", extended_flags);
-		flags |= extended_flags;
-		name = ondisk2->name;
-	}
-	else
-		name = ondisk->name;
-
-	if (!previous_name) {
-		/* v3 and earlier */
-		if (len == CE_NAMEMASK)
-			len = strlen(name);
-		ce = cache_entry_from_ondisk(ondisk, flags, name, len);
-
-		*ent_size = ondisk_ce_size(ce);
-	} else {
-		unsigned long consumed;
-		consumed = expand_name_field(previous_name, name);
-		ce = cache_entry_from_ondisk(ondisk, flags,
-					     previous_name->buf,
-					     previous_name->len);
-
-		*ent_size = (name - ((char *)ondisk)) + consumed;
-	}
-	return ce;
-}
-
-static int read_index_v2(struct index_state *istate, void *mmap, unsigned long mmap_size)
-{
-	int i;
-	unsigned long src_offset;
-	struct cache_header *hdr;
-	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
-
-	hdr = mmap;
-
-	istate->version = ntohl(hdr->hdr_version);
-	istate->cache_nr = ntohl(hdr->hdr_entries);
-	istate->cache_alloc = alloc_nr(istate->cache_nr);
-	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
-	istate->initialized = 1;
-
-	if (istate->version == 4)
-		previous_name = &previous_name_buf;
-	else
-		previous_name = NULL;
-
-	src_offset = sizeof(*hdr);
-	for (i = 0; i < istate->cache_nr; i++) {
-		struct ondisk_cache_entry *disk_ce;
-		struct cache_entry *ce;
-		unsigned long consumed;
-
-		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
-		ce = create_from_disk(disk_ce, &consumed, previous_name);
-		set_index_entry(istate, i, ce);
-
-		src_offset += consumed;
-	}
-	strbuf_release(&previous_name_buf);
-
-	while (src_offset <= mmap_size - 20 - 8) {
-		/* After an array of active_nr index entries,
-		 * there can be arbitrary number of extended
-		 * sections, each of which is prefixed with
-		 * extension name (4-byte) and section length
-		 * in 4-byte network byte order.
-		 */
-		uint32_t extsize;
-		memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
-		extsize = ntohl(extsize);
-		if (read_index_extension(istate,
-					 (const char *) mmap + src_offset,
-					 (char *) mmap + src_offset + 8,
-					 extsize) < 0)
-			goto unmap;
-		src_offset += 8;
-		src_offset += extsize;
-	}
-	return 0;
-unmap:
-	munmap(mmap, mmap_size);
-	die("index file corrupt");
-}
-
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1508,10 +1307,13 @@ int read_index_from(struct index_state *istate, const char *path)
 	errno = ENOENT;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
+
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
-		if (errno == ENOENT)
+		if (errno == ENOENT) {
+			initialize_index(istate, 0);
 			return 0;
+		}
 		die_errno("index file open failed");
 	}
 
@@ -1520,24 +1322,23 @@ int read_index_from(struct index_state *istate, const char *path)
 
 	errno = EINVAL;
 	mmap_size = xsize_t(st.st_size);
-	if (mmap_size < sizeof(struct cache_header) + 20)
-		die("index file smaller than expected");
-
 	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	close(fd);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
 
 	hdr = mmap;
-	if (verify_hdr_version(hdr, mmap_size) < 0)
+	if (verify_hdr_version(istate, hdr, mmap_size) < 0)
 		goto unmap;
 
-	if (verify_hdr(mmap, mmap_size) < 0)
+	if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
 		goto unmap;
 
-	read_index_v2(istate, mmap, mmap_size);
+	if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
+		goto unmap;
 	istate->timestamp.sec = st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+
 	munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
@@ -1568,6 +1369,7 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	istate->ops = NULL;
 	return 0;
 }
 
@@ -1581,201 +1383,6 @@ int unmerged_index(const struct index_state *istate)
 	return 0;
 }
 
-#define WRITE_BUFFER_SIZE 8192
-static unsigned char write_buffer[WRITE_BUFFER_SIZE];
-static unsigned long write_buffer_len;
-
-static int ce_write_flush(git_SHA_CTX *context, int fd)
-{
-	unsigned int buffered = write_buffer_len;
-	if (buffered) {
-		git_SHA1_Update(context, write_buffer, buffered);
-		if (write_in_full(fd, write_buffer, buffered) != buffered)
-			return -1;
-		write_buffer_len = 0;
-	}
-	return 0;
-}
-
-static int ce_write(git_SHA_CTX *context, int fd, void *data, unsigned int len)
-{
-	while (len) {
-		unsigned int buffered = write_buffer_len;
-		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
-		if (partial > len)
-			partial = len;
-		memcpy(write_buffer + buffered, data, partial);
-		buffered += partial;
-		if (buffered == WRITE_BUFFER_SIZE) {
-			write_buffer_len = buffered;
-			if (ce_write_flush(context, fd))
-				return -1;
-			buffered = 0;
-		}
-		write_buffer_len = buffered;
-		len -= partial;
-		data = (char *) data + partial;
-	}
-	return 0;
-}
-
-static int write_index_ext_header(git_SHA_CTX *context, int fd,
-				  unsigned int ext, unsigned int sz)
-{
-	ext = htonl(ext);
-	sz = htonl(sz);
-	return ((ce_write(context, fd, &ext, 4) < 0) ||
-		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
-}
-
-static int ce_flush(git_SHA_CTX *context, int fd)
-{
-	unsigned int left = write_buffer_len;
-
-	if (left) {
-		write_buffer_len = 0;
-		git_SHA1_Update(context, write_buffer, left);
-	}
-
-	/* Flush first if not enough space for SHA1 signature */
-	if (left + 20 > WRITE_BUFFER_SIZE) {
-		if (write_in_full(fd, write_buffer, left) != left)
-			return -1;
-		left = 0;
-	}
-
-	/* Append the SHA1 signature at the end */
-	git_SHA1_Final(write_buffer + left, context);
-	left += 20;
-	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
-}
-
-static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
-{
-	/*
-	 * The only thing we care about in this function is to smudge the
-	 * falsely clean entry due to touch-update-touch race, so we leave
-	 * everything else as they are.  We are called for entries whose
-	 * ce_stat_data.sd_mtime match the index file mtime.
-	 *
-	 * Note that this actually does not do much for gitlinks, for
-	 * which ce_match_stat_basic() always goes to the actual
-	 * contents.  The caller checks with is_racy_timestamp() which
-	 * always says "no" for gitlinks, so we are not called for them ;-)
-	 */
-	struct stat st;
-
-	if (lstat(ce->name, &st) < 0)
-		return;
-	if (ce_match_stat_basic(ce, &st))
-		return;
-	if (ce_modified_check_fs(ce, &st)) {
-		/* This is "racily clean"; smudge it.  Note that this
-		 * is a tricky code.  At first glance, it may appear
-		 * that it can break with this sequence:
-		 *
-		 * $ echo xyzzy >frotz
-		 * $ git-update-index --add frotz
-		 * $ : >frotz
-		 * $ sleep 3
-		 * $ echo filfre >nitfol
-		 * $ git-update-index --add nitfol
-		 *
-		 * but it does not.  When the second update-index runs,
-		 * it notices that the entry "frotz" has the same timestamp
-		 * as index, and if we were to smudge it by resetting its
-		 * size to zero here, then the object name recorded
-		 * in index is the 6-byte file but the cached stat information
-		 * becomes zero --- which would then match what we would
-		 * obtain from the filesystem next time we stat("frotz").
-		 *
-		 * However, the second update-index, before calling
-		 * this function, notices that the cached size is 6
-		 * bytes and what is on the filesystem is an empty
-		 * file, and never calls us, so the cached size information
-		 * for "frotz" stays 6 which does not match the filesystem.
-		 */
-		ce->ce_stat_data.sd_size = 0;
-	}
-}
-
-/* Copy miscellaneous fields but not the name */
-static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
-				       struct cache_entry *ce)
-{
-	short flags;
-
-	ondisk->ctime.sec = htonl(ce->ce_stat_data.sd_ctime.sec);
-	ondisk->mtime.sec = htonl(ce->ce_stat_data.sd_mtime.sec);
-	ondisk->ctime.nsec = htonl(ce->ce_stat_data.sd_ctime.nsec);
-	ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);
-	ondisk->dev  = htonl(ce->ce_stat_data.sd_dev);
-	ondisk->ino  = htonl(ce->ce_stat_data.sd_ino);
-	ondisk->mode = htonl(ce->ce_mode);
-	ondisk->uid  = htonl(ce->ce_stat_data.sd_uid);
-	ondisk->gid  = htonl(ce->ce_stat_data.sd_gid);
-	ondisk->size = htonl(ce->ce_stat_data.sd_size);
-	hashcpy(ondisk->sha1, ce->sha1);
-
-	flags = ce->ce_flags;
-	flags |= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));
-	ondisk->flags = htons(flags);
-	if (ce->ce_flags & CE_EXTENDED) {
-		struct ondisk_cache_entry_extended *ondisk2;
-		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
-		ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);
-		return ondisk2->name;
-	}
-	else {
-		return ondisk->name;
-	}
-}
-
-static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce,
-			  struct strbuf *previous_name)
-{
-	int size;
-	struct ondisk_cache_entry *ondisk;
-	char *name;
-	int result;
-
-	if (!previous_name) {
-		size = ondisk_ce_size(ce);
-		ondisk = xcalloc(1, size);
-		name = copy_cache_entry_to_ondisk(ondisk, ce);
-		memcpy(name, ce->name, ce_namelen(ce));
-	} else {
-		int common, to_remove, prefix_size;
-		unsigned char to_remove_vi[16];
-		for (common = 0;
-		     (ce->name[common] &&
-		      common < previous_name->len &&
-		      ce->name[common] == previous_name->buf[common]);
-		     common++)
-			; /* still matching */
-		to_remove = previous_name->len - common;
-		prefix_size = encode_varint(to_remove, to_remove_vi);
-
-		if (ce->ce_flags & CE_EXTENDED)
-			size = offsetof(struct ondisk_cache_entry_extended, name);
-		else
-			size = offsetof(struct ondisk_cache_entry, name);
-		size += prefix_size + (ce_namelen(ce) - common + 1);
-
-		ondisk = xcalloc(1, size);
-		name = copy_cache_entry_to_ondisk(ondisk, ce);
-		memcpy(name, to_remove_vi, prefix_size);
-		memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common);
-
-		strbuf_splice(previous_name, common, to_remove,
-			      ce->name + common, ce_namelen(ce) - common);
-	}
-
-	result = ce_write(c, fd, ondisk, size);
-	free(ondisk);
-	return result;
-}
-
 static int has_racy_timestamp(struct index_state *istate)
 {
 	int entries = istate->cache_nr;
@@ -1801,101 +1408,9 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-static int write_index_v2(struct index_state *istate, int newfd)
-{
-	git_SHA_CTX c;
-	struct cache_header hdr;
-	int i, err, removed, extended, hdr_version;
-	struct cache_entry **cache = istate->cache;
-	int entries = istate->cache_nr;
-	struct stat st;
-	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
-
-	for (i = removed = extended = 0; i < entries; i++) {
-		if (cache[i]->ce_flags & CE_REMOVE)
-			removed++;
-
-		/* reduce extended entries if possible */
-		cache[i]->ce_flags &= ~CE_EXTENDED;
-		if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
-			extended++;
-			cache[i]->ce_flags |= CE_EXTENDED;
-		}
-	}
-
-	if (!istate->version)
-		istate->version = INDEX_FORMAT_DEFAULT;
-
-	/* demote version 3 to version 2 when the latter suffices */
-	if (istate->version == 3 || istate->version == 2)
-		istate->version = extended ? 3 : 2;
-
-	hdr_version = istate->version;
-
-	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
-	hdr.hdr_version = htonl(hdr_version);
-	hdr.hdr_entries = htonl(entries - removed);
-
-	git_SHA1_Init(&c);
-	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
-		return -1;
-
-	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
-	for (i = 0; i < entries; i++) {
-		struct cache_entry *ce = cache[i];
-		if (ce->ce_flags & CE_REMOVE)
-			continue;
-		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
-			ce_smudge_racily_clean_entry(ce);
-		if (is_null_sha1(ce->sha1)) {
-			static const char msg[] = "cache entry has null sha1: %s";
-			static int allow = -1;
-
-			if (allow < 0)
-				allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);
-			if (allow)
-				warning(msg, ce->name);
-			else
-				return error(msg, ce->name);
-		}
-		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
-			return -1;
-	}
-	strbuf_release(&previous_name_buf);
-
-	/* Write extension data here */
-	if (istate->cache_tree) {
-		struct strbuf sb = STRBUF_INIT;
-
-		cache_tree_write(&sb, istate->cache_tree);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
-			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
-		strbuf_release(&sb);
-		if (err)
-			return -1;
-	}
-	if (istate->resolve_undo) {
-		struct strbuf sb = STRBUF_INIT;
-
-		resolve_undo_write(&sb, istate->resolve_undo);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
-					     sb.len) < 0
-			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
-		strbuf_release(&sb);
-		if (err)
-			return -1;
-	}
-
-	if (ce_flush(&c, newfd) || fstat(newfd, &st))
-		return -1;
-	istate->timestamp.sec = (unsigned int)st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
-	return 0;
-}
-
 int write_index(struct index_state *istate, int newfd)
 {
-	return write_index_v2(istate, newfd);
+	return istate->ops->write_index(istate, newfd);
 }
 
 /*
diff --git a/read-cache.h b/read-cache.h
new file mode 100644
index 0000000..ceedcae
--- /dev/null
+++ b/read-cache.h
@@ -0,0 +1,63 @@
+#ifndef READ_CACHE_H
+#define READ_CACHE_H
+
+/* Index extensions.
+ *
+ * The first letter should be 'A'..'Z' for extensions that are not
+ * necessary for a correct operation (i.e. optimization data).
+ * When new extensions are added that _needs_ to be understood in
+ * order to correctly interpret the index file, pick character that
+ * is outside the range, to cause the reader to abort.
+ */
+
+#define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
+#define CACHE_EXT_TREE 0x54524545	/* "TREE" */
+#define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+
+#define INDEX_FORMAT_DEFAULT 3
+
+/*
+ * Basic data structures for the directory cache
+ */
+struct cache_header {
+	uint32_t hdr_signature;
+	uint32_t hdr_version;
+	uint32_t hdr_entries;
+};
+
+struct index_ops {
+	int (*match_stat_basic)(const struct cache_entry *ce, struct stat *st, int changed);
+	int (*verify_hdr)(void *mmap, unsigned long size);
+	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size);
+	int (*write_index)(struct index_state *istate, int newfd);
+};
+
+extern struct index_ops v2_ops;
+
+#ifndef NEEDS_ALIGNED_ACCESS
+#define ntoh_s(var) ntohs(var)
+#define ntoh_l(var) ntohl(var)
+#else
+static inline uint16_t ntoh_s_force_align(void *p)
+{
+	uint16_t x;
+	memcpy(&x, p, sizeof(x));
+	return ntohs(x);
+}
+static inline uint32_t ntoh_l_force_align(void *p)
+{
+	uint32_t x;
+	memcpy(&x, p, sizeof(x));
+	return ntohl(x);
+}
+#define ntoh_s(var) ntoh_s_force_align(&(var))
+#define ntoh_l(var) ntoh_l_force_align(&(var))
+#endif
+
+extern int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st);
+extern int ce_match_stat_basic(const struct index_state *istate,
+			       const struct cache_entry *ce, struct stat *st);
+extern int is_racy_timestamp(const struct index_state *istate, const struct cache_entry *ce);
+extern void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce);
+
+#endif
diff --git a/test-index-version.c b/test-index-version.c
index 05d4699..d3c0ebd 100644
--- a/test-index-version.c
+++ b/test-index-version.c
@@ -1,5 +1,11 @@
 #include "cache.h"
 
+struct cache_header {
+	uint32_t hdr_signature;
+	uint32_t hdr_version;
+	uint32_t hdr_entries;
+};
+
 int main(int argc, char **argv)
 {
 	struct cache_header hdr;
diff --git a/unpack-trees.c b/unpack-trees.c
index 35cb05e..8d07c2a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1035,10 +1035,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	}
 
 	memset(&o->result, 0, sizeof(o->result));
-	o->result.initialized = 1;
+	initialize_index(&o->result, o->src_index->version);
 	o->result.timestamp.sec = o->src_index->timestamp.sec;
 	o->result.timestamp.nsec = o->src_index->timestamp.nsec;
-	o->result.version = o->src_index->version;
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 04/24] read-cache: Re-read index if index file changed
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (2 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Add the possibility of re-reading the index file, if it changed
while reading.

The index file might change during the read, causing outdated
information to be displayed. We check if the index file changed
by using its stat data as heuristic.

Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 65 +++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 31 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index e081084..51be1bb 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1294,8 +1294,8 @@ int read_index(struct index_state *istate)
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
-	int fd;
-	struct stat st;
+	int fd, err, i;
+	struct stat_validity sv;
 	struct cache_header *hdr;
 	void *mmap;
 	size_t mmap_size;
@@ -1307,43 +1307,46 @@ int read_index_from(struct index_state *istate, const char *path)
 	errno = ENOENT;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
-
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT) {
-			initialize_index(istate, 0);
-			return 0;
+	sv.sd = NULL;
+	for (i = 0; i < 50; i++) {
+		err = 0;
+		fd = open(path, O_RDONLY);
+		if (fd < 0) {
+			if (errno == ENOENT) {
+				initialize_index(istate, 0);
+				return 0;
+			}
+			die_errno("index file open failed");
 		}
-		die_errno("index file open failed");
-	}
 
-	if (fstat(fd, &st))
-		die_errno("cannot stat the open index");
+		stat_validity_update(&sv, fd);
+		if (!sv.sd)
+			die_errno("cannot stat the open index");
 
-	errno = EINVAL;
-	mmap_size = xsize_t(st.st_size);
-	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
-	close(fd);
-	if (mmap == MAP_FAILED)
-		die_errno("unable to map index file");
+		errno = EINVAL;
+		mmap_size = xsize_t(sv.sd->sd_size);
+		mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+		close(fd);
+		if (mmap == MAP_FAILED)
+			die_errno("unable to map index file");
 
-	hdr = mmap;
-	if (verify_hdr_version(istate, hdr, mmap_size) < 0)
-		goto unmap;
+		hdr = mmap;
+		if (verify_hdr_version(istate, hdr, mmap_size) < 0)
+			err = 1;
 
-	if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
-		goto unmap;
+		if (!err && istate->ops->verify_hdr(mmap, mmap_size) < 0)
+			err = 1;
 
-	if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
-		goto unmap;
-	istate->timestamp.sec = st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+		if (!err && istate->ops->read_index(istate, mmap, mmap_size) < 0)
+			err = 1;
+		istate->timestamp.sec = sv.sd->sd_mtime.sec;
+		istate->timestamp.nsec = sv.sd->sd_mtime.nsec;
 
-	munmap(mmap, mmap_size);
-	return istate->cache_nr;
+		munmap(mmap, mmap_size);
+		if (stat_validity_check(&sv, path) && !err)
+			return istate->cache_nr;
+	}
 
-unmap:
-	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 05/24] add documentation for the index api
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (3 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Add documentation for the index reading api.  This also includes
documentation for the new api functions introduced in the next patch.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/api-in-core-index.txt | 56 +++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
index adbdbf5..2cf7a71 100644
--- a/Documentation/technical/api-in-core-index.txt
+++ b/Documentation/technical/api-in-core-index.txt
@@ -1,14 +1,62 @@
 in-core index API
 =================
 
+Reading API
+-----------
+
+`cache`::
+
+	An array of cache entries.  This is used to access the cache
+	entries directly.  Use `index_name_pos` to search for the
+	index of a specific cache entry.
+
+`read_index_filtered`::
+
+	Read a part of the index, filtered by the pathspec given in
+	the opts.  The function may load more than necessary, so the
+	caller is still responsible for applying filters appropriately.  The
+	filtering is only done for performance reasons, as it's
+	possible to only read part of the index when the on-disk
+	format is index-v5.
++
+To iterate only over the entries that match the pathspec, use
+the for_each_index_entry function.
+
+`read_index`::
+
+	Read the whole index file from disk.
+
+`index_name_pos`::
+
+	Find a cache_entry with name in the index.  Returns pos if an
+	entry is matched exactly and -1-pos if an entry is matched
+	partially. e.g.
++
+....
+index:
+	file1
+	file2
+	path/file1
+	zzz
+....
++
+`index_name_pos("path/file1", 10)` returns 2, while
+`index_name_pos("path", 4)` returns -3
+
+`for_each_index_entry`::
+
+	Iterates over all cache_entries in the index filtered by
+	filter_opts in the index_state.  For each cache entry fn is
+	executed with cb_data as callback data.  From within the loop
+	do `return 0` to continue, or `return 1` to break the loop.
+
+TODO
+----
 Talk about <read-cache.c> and <cache-tree.c>, things like:
 
-* cache -> the_index macros
-* read_index()
 * write_index()
 * ie_match_stat() and ie_modified(); how they are different and when to
   use which.
-* index_name_pos()
 * remove_index_entry_at()
 * remove_file_from_index()
 * add_file_to_index()
@@ -18,4 +66,4 @@ Talk about <read-cache.c> and <cache-tree.c>, things like:
 * cache_tree_invalidate_path()
 * cache_tree_update()
 
-(JC, Linus)
+(JC, Linus, Thomas Gummerer)
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 06/24] read-cache: add index reading api
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (4 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Add an api for access to the index file.  Currently there is only a very
basic api for accessing the index file, which only allows a full read of
the index, and lets the users of the data filter it.  The new index api
gives the users the possibility to use only part of the index and
provides functions for iterating over and accessing cache entries.

This simplifies future improvements to the in-memory format, as changes
will be concentrated on one file, instead of the whole git source code.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h         | 41 ++++++++++++++++++++++++++++++++++++++++-
 read-cache-v2.c | 10 ++++++++--
 read-cache.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
 read-cache.h    |  3 ++-
 4 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 290e26d..38d57e7 100644
--- a/cache.h
+++ b/cache.h
@@ -127,7 +127,7 @@ struct cache_entry {
 	unsigned int ce_flags;
 	unsigned int ce_namelen;
 	unsigned char sha1[20];
-	struct cache_entry *next;
+	struct cache_entry *next; /* used by name_hash */
 	char name[FLEX_ARRAY]; /* more */
 };
 
@@ -260,6 +260,29 @@ static inline unsigned int canon_mode(unsigned int mode)
 
 #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
 
+/*
+ * Options by which the index should be filtered when read partially.
+ *
+ * pathspec: The pathspec which the index entries have to match
+ * seen: Used to return the seen parameter from match_pathspec()
+ * max_prefix_len: The common prefix length of the pathspecs
+ *
+ * read_staged: used to indicate if the conflicted entries (entries
+ *     with a stage) should be included
+ * read_cache_tree: used to indicate if the cache-tree should be read
+ * read_resolve_undo: used to indicate if the resolve undo data should
+ *     be read
+ */
+struct filter_opts {
+	const struct pathspec *pathspec;
+	char *seen;
+	int max_prefix_len;
+
+	int read_staged;
+	int read_cache_tree;
+	int read_resolve_undo;
+};
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -272,6 +295,7 @@ struct index_state {
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
 	struct index_ops *ops;
+	struct filter_opts *filter_opts;
 };
 
 extern struct index_state the_index;
@@ -317,6 +341,12 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+
+/* index api */
+#define read_cache_filtered(opts) read_index_filtered(&the_index, (opts))
+#define read_cache_filtered_from(path, opts) read_index_filtered_from(&the_index, (path), (opts))
+#define for_each_cache_entry(fn, cb_data) \
+	for_each_index_entry(&the_index, (fn), (cb_data))
 #endif
 
 enum object_type {
@@ -449,6 +479,15 @@ extern void sanitize_stdfds(void);
 		} \
 	} while (0)
 
+/* index api */
+extern int read_index_filtered(struct index_state *, struct filter_opts *opts);
+extern int read_index_filtered_from(struct index_state *, const char *path, struct filter_opts *opts);
+
+typedef int each_cache_entry_fn(struct cache_entry *ce, void *);
+extern int for_each_index_entry(struct index_state *istate,
+				each_cache_entry_fn, void *);
+
+
 /* Initialize and use the cache information */
 extern void initialize_index(struct index_state *istate, int version);
 extern void change_index_version(struct index_state *istate, int version);
diff --git a/read-cache-v2.c b/read-cache-v2.c
index a7d076c..f884c10 100644
--- a/read-cache-v2.c
+++ b/read-cache-v2.c
@@ -3,6 +3,7 @@
 #include "resolve-undo.h"
 #include "cache-tree.h"
 #include "varint.h"
+#include "dir.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 #define CE_NAMEMASK  (0x0fff)
@@ -202,8 +203,14 @@ static int read_index_extension(struct index_state *istate,
 	return 0;
 }
 
+/*
+ * The performance is the same if we read the whole index or only
+ * part of it, therefore we always read the whole index to avoid
+ * having to re-read it later.  The filter_opts will determine
+ * what part of the index is used when retrieving the cache-entries.
+ */
 static int read_index_v2(struct index_state *istate, void *mmap,
-			 unsigned long mmap_size)
+			 unsigned long mmap_size, struct filter_opts *opts)
 {
 	int i;
 	unsigned long src_offset;
@@ -229,7 +236,6 @@ static int read_index_v2(struct index_state *istate, void *mmap,
 		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
 		ce = create_from_disk(disk_ce, &consumed, previous_name);
 		set_index_entry(istate, i, ce);
-
 		src_offset += consumed;
 	}
 	strbuf_release(&previous_name_buf);
diff --git a/read-cache.c b/read-cache.c
index 51be1bb..01f5397 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1288,11 +1288,41 @@ static int verify_hdr_version(struct index_state *istate,
 
 int read_index(struct index_state *istate)
 {
-	return read_index_from(istate, get_index_file());
+	return read_index_filtered_from(istate, get_index_file(), NULL);
 }
 
-/* remember to discard_cache() before reading a different cache! */
-int read_index_from(struct index_state *istate, const char *path)
+int read_index_filtered(struct index_state *istate, struct filter_opts *opts)
+{
+	return read_index_filtered_from(istate, get_index_file(), opts);
+}
+
+/*
+ * Execute fn for each index entry which is currently in istate.  Data
+ * can be given to the function using the cb_data parameter.
+ */
+int for_each_index_entry(struct index_state *istate, each_cache_entry_fn fn, void *cb_data)
+{
+	int i, ret = 0;
+	struct filter_opts *opts = istate->filter_opts;
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+
+		if (opts && !opts->read_staged && ce_stage(ce))
+			continue;
+
+		if (opts && !match_pathspec_depth(opts->pathspec, ce->name, ce_namelen(ce),
+						  opts->max_prefix_len, opts->seen))
+			continue;
+
+		if ((ret = fn(istate->cache[i], cb_data)))
+			break;
+	}
+	return ret;
+}
+
+int read_index_filtered_from(struct index_state *istate, const char *path,
+			     struct filter_opts *opts)
 {
 	int fd, err, i;
 	struct stat_validity sv;
@@ -1307,6 +1337,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	errno = ENOENT;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
+	istate->filter_opts = opts;
 	sv.sd = NULL;
 	for (i = 0; i < 50; i++) {
 		err = 0;
@@ -1337,7 +1368,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		if (!err && istate->ops->verify_hdr(mmap, mmap_size) < 0)
 			err = 1;
 
-		if (!err && istate->ops->read_index(istate, mmap, mmap_size) < 0)
+		if (!err && istate->ops->read_index(istate, mmap, mmap_size, opts) < 0)
 			err = 1;
 		istate->timestamp.sec = sv.sd->sd_mtime.sec;
 		istate->timestamp.nsec = sv.sd->sd_mtime.nsec;
@@ -1350,6 +1381,13 @@ int read_index_from(struct index_state *istate, const char *path)
 	die("index file corrupt");
 }
 
+
+/* remember to discard_cache() before reading a different cache! */
+int read_index_from(struct index_state *istate, const char *path)
+{
+	return read_index_filtered_from(istate, path, NULL);
+}
+
 int is_index_unborn(struct index_state *istate)
 {
 	return (!istate->cache_nr && !istate->timestamp.sec);
@@ -1373,6 +1411,7 @@ int discard_index(struct index_state *istate)
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
 	istate->ops = NULL;
+	istate->filter_opts = NULL;
 	return 0;
 }
 
diff --git a/read-cache.h b/read-cache.h
index ceedcae..f920546 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -28,7 +28,8 @@ struct cache_header {
 struct index_ops {
 	int (*match_stat_basic)(const struct cache_entry *ce, struct stat *st, int changed);
 	int (*verify_hdr)(void *mmap, unsigned long size);
-	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size);
+	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size,
+			  struct filter_opts *opts);
 	int (*write_index)(struct index_state *istate, int newfd);
 };
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 07/24] make sure partially read index is not changed
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (5 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

A partially read index file currently cannot be written to disk.  Make
sure that never happens by erroring out when a caller tries to write a
partially read index.  Do the same when trying to re-read a partially
read index without having discarded it first to avoid losing any
information.

Forcing the caller to load the right part of the index file, instead of
re-reading it when changing it, gives a bit of a performance advantage
by avoiding reading parts of the index twice.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 01f5397..7020f26 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1330,6 +1330,8 @@ int read_index_filtered_from(struct index_state *istate, const char *path,
 	void *mmap;
 	size_t mmap_size;
 
+	if (istate->filter_opts)
+		die("BUG: cannot re-read partially read index");
 	errno = EBUSY;
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -1452,6 +1454,8 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 
 int write_index(struct index_state *istate, int newfd)
 {
+	if (istate->filter_opts)
+		die("BUG: cannot write a partially read index");
 	return istate->ops->write_index(istate, newfd);
 }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 08/24] grep.c: use index api
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (6 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/grep.c | 69 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 63f8603..36c2bf0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -369,41 +369,31 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 	free(argv);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+struct grep_opts {
+	struct grep_opt *opt;
+	const struct pathspec *pathspec;
+	int cached;
+	int hit;
+};
+
+static int grep_cache(struct cache_entry *ce, void *cb_data)
 {
-	int hit = 0;
-	int nr;
-	read_cache();
+	struct grep_opts *opts = cb_data;
 
-	for (nr = 0; nr < active_nr; nr++) {
-		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!match_pathspec_depth(pathspec, ce->name, ce_namelen(ce), 0, NULL))
-			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->sha1, ce->name, 0, ce->name);
-		}
-		else
-			hit |= grep_file(opt, ce->name);
-		if (ce_stage(ce)) {
-			do {
-				nr++;
-			} while (nr < active_nr &&
-				 !strcmp(ce->name, active_cache[nr]->name));
-			nr--; /* compensate for loop control */
-		}
-		if (hit && opt->status_only)
-			break;
-	}
-	return hit;
+	if (!S_ISREG(ce->ce_mode))
+		return 0;
+	/*
+	 * If CE_VALID is on, we assume worktree file and its cache entry
+	 * are identical, even if worktree file has been modified, so use
+	 * cache version instead
+	 */
+	if (opts->cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce))
+		opts->hit |= grep_sha1(opts->opt, ce->sha1, ce->name, 0, ce->name);
+	else
+		opts->hit |= grep_file(opts->opt, ce->name);
+	if (opts->hit && opts->opt->status_only)
+		return 1;
+	return 0;
 }
 
 static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
@@ -900,10 +890,21 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	} else if (0 <= opt_exclude) {
 		die(_("--[no-]exclude-standard cannot be used for tracked contents."));
 	} else if (!list.nr) {
+		struct grep_opts opts;
+		struct filter_opts *filter_opts = xmalloc(sizeof(*filter_opts));
+
 		if (!cached)
 			setup_work_tree();
 
-		hit = grep_cache(&opt, &pathspec, cached);
+		memset(filter_opts, 0, sizeof(*filter_opts));
+		filter_opts->pathspec = &pathspec;
+		opts.opt = &opt;
+		opts.pathspec = &pathspec;
+		opts.cached = cached;
+		opts.hit = 0;
+		read_cache_filtered(filter_opts);
+		for_each_cache_entry(grep_cache, &opts);
+		hit = opts.hit;
 	} else {
 		if (cached)
 			die(_("both --cached and trees are given."));
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 09/24] ls-files.c: use index api
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (7 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-30  9:17   ` Duy Nguyen
  2013-11-30 15:39   ` Antoine Pelisse
  2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
                   ` (15 subsequent siblings)
  24 siblings, 2 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/ls-files.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e1cf6d8..22fb012 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -290,6 +290,22 @@ static void prune_cache(const char *prefix)
 	active_nr = last;
 }
 
+static int needs_trailing_slash_stripped(void)
+{
+	int i;
+
+	if (!pathspec.nr)
+		return 0;
+
+	for (i = 0; i < pathspec.nr; i++) {
+		int len = strlen(pathspec.items[i].original);
+
+		if (len > 1 && (pathspec.items[i].original)[len - 1] == '/')
+			return 1;
+	}
+	return 0;
+}
+
 /*
  * Read the tree specified with --with-tree option
  * (typically, HEAD) into stage #1 and then
@@ -447,6 +463,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	struct dir_struct dir;
 	struct exclude_list *el;
 	struct string_list exclude_list = STRING_LIST_INIT_NODUP;
+	struct filter_opts *opts = xmalloc(sizeof(*opts));
 	struct option builtin_ls_files_options[] = {
 		{ OPTION_CALLBACK, 'z', NULL, NULL, NULL,
 			N_("paths are separated with NUL character"),
@@ -512,9 +529,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		prefix_len = strlen(prefix);
 	git_config(git_default_config, NULL);
 
-	if (read_cache() < 0)
-		die("index file corrupt");
-
 	argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
 			ls_files_usage, 0);
 	el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
@@ -550,6 +564,24 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
 		       prefix, argv);
 
+	if (!with_tree && !needs_trailing_slash_stripped()) {
+		memset(opts, 0, sizeof(*opts));
+		opts->pathspec = &pathspec;
+		opts->read_staged = 1;
+		if (show_resolve_undo)
+			opts->read_resolve_undo = 1;
+		if (read_cache_filtered(opts) < 0)
+			die("index file corrupt");
+	} else {
+		if (read_cache() < 0)
+			die("index file corrupt");
+		parse_pathspec(&pathspec, 0,
+			       PATHSPEC_PREFER_CWD |
+			       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
+			       prefix, argv);
+
+	}
+
 	/* Find common prefix for all pathspec's */
 	max_prefix = common_prefix(&pathspec);
 	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 10/24] documentation: add documentation of the index-v5 file format
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (8 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Add a documentation of the index file format version 5 to
Documentation/technical.

Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Helped-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/index-file-format-v5.txt | 294 +++++++++++++++++++++++
 1 file changed, 294 insertions(+)
 create mode 100644 Documentation/technical/index-file-format-v5.txt

diff --git a/Documentation/technical/index-file-format-v5.txt b/Documentation/technical/index-file-format-v5.txt
new file mode 100644
index 0000000..1bd8a23
--- /dev/null
+++ b/Documentation/technical/index-file-format-v5.txt
@@ -0,0 +1,294 @@
+GIT index format
+================
+
+== The git index
+
+   The git index file (.git/index) documents the status of the files
+     in the git staging area.
+
+   The staging area is used for preparing commits, merging, etc.
+
+== The git index file format
+
+   All binary numbers are in network byte order. Version 5 is described
+     here. The index file consists of various sections. They appear in
+     the following order in the file.
+
+   - header: the description of the index format, including it's signature,
+     version and various other fields that are used internally.
+
+   - diroffsets (ndir entries of "direcotry offset"): A 4-byte offset
+       relative to the beginning of the "direntries block" (see below)
+       for each of the ndir directories in the index, sorted by pathname
+       (of the directory it's pointing to). [1]
+
+   - direntries (ndir entries of "directory offset"): A directory entry
+       for each of the ndir directories in the index, sorted by pathname
+       (see below). [2]
+
+   - fileoffsets (nfile entries of "file offset"): A 4-byte offset
+       relative to the beginning of the fileentries block (see below)
+       for each of the nfile files in the index. [1]
+
+   - fileentries (nfile entries of "file entry"): A file entry for
+       each of the nfile files in the index (see below).
+
+   - Extensions (Currently REUC, see below for details)
+
+     Extensions are identified by signature. Optional extensions can
+     be ignored if GIT does not understand them.
+
+     GIT supports an arbitrary number of extension, but currently none
+     is implemented. [3]
+
+     extsig (32-bits): extension signature. If the first byte is 'A'..'Z'
+     the extension is optional and can be ignored.
+
+     extsize (32-bits): number of entries in the extension
+
+     extchecksum (32-bits): crc32 checksum of the extension signature
+       and size.
+
+    - Extension data.
+
+== Header
+   sig (32-bits): Signature:
+     The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
+
+   vnr (32-bits): Version number:
+     The current supported versions are 2, 3, 4 and 5.
+
+   nfile (32-bits): number of file entries in the index.
+
+   ndir (32-bits): number of directories in the index.
+
+   fblockoffset (32-bits): offset to the file block, relative to the
+     beginning of the file.
+
+   - Offset to the extensions.
+
+     nextensions (32-bits): number of extensions.
+
+     extoffset (32-bits): offset to the extension. (Possibly none, as
+       many as indicated in the 4-byte number of extensions)
+
+   headercrc (32-bits): crc checksum including the header and the
+     offsets to the extensions.
+
+
+== Directory offsets (diroffsets)
+
+  diroffset (32-bits): offset to the directory relative to the
+    beginning of the index file. There are ndir + 1 offsets in the
+    diroffset table, the last is pointing to the end of the last
+    direntry. With this last entry, we are able to replace the strlen
+    of the directory name when reading the directory name, by
+    calculating it from diroffset[n+1]-diroffset[n]-61.  61 is the
+    size of the directory data, which follows each each directory +
+    the crc sum + the NUL byte.
+
+  This part is needed for making the directory entries bisectable and
+    thus allowing a binary search.
+
+== Directory entry (direntries)
+
+  Directory entries are sorted in lexicographic order by the name
+    of their path starting with the root.
+
+  foffset (32-bits): offset to the lexicographically first file in
+    the file offsets (fileoffsets), relative to the beginning of
+    the fileoffset block.
+
+  cr (32-bits): offset to conflicted/resolved data at the end of the
+    index. 0 if there is no such data. [4]
+
+  ncr (32-bits): number of conflicted/resolved data entries at the
+    end of the index if the offset is non 0. If cr is 0, ncr is
+    also 0.
+
+  nsubtrees (32-bits): number of subtrees this tree has in the index.
+
+  nfiles (32-bits): number of files in the directory, that are in
+    the index.
+
+  nentries (32-bits): number of entries in the index that is covered
+    by the tree this entry represents. (-1 if the entry is invalid).
+    This number includes all the files in this tree, recursively.
+
+  objname (160-bits): object name for the object that would result
+    from writing this span of index as a tree. This is only valid
+    if nentries is valid, meaning the cache-tree is valid.
+
+  flags (32-bits): 'flags' field split into (high to low bits) (For
+    D/F conflicts)
+
+    stage (2-bits): stage of the directory during merge
+
+    30-bit unused
+
+  pathname (variable length, nul terminated): relative to top level
+    directory (without the leading slash). '/' is used as path
+    separator. A string of length 0 ('') indicates the root directory.
+    The special path components ".", and ".." (without quotes) are
+    disallowed. The path also includes a trailing slash. [9]
+
+  dircrc (32-bits): crc32 checksum for each directory entry.
+
+  The last 4-byte number of entries and the 160-bit object name are
+    for the cache tree. An entry can be in an invalidated state which is
+    represented by having -1 in the entry_count field.
+
+  The entries are written out in the top-down, depth-first order. The
+    first entry represents the root level of the repository, followed by
+    the first subtree - let's call it A - of the root level, followed by
+    the first subtree of A, ... There is no prefix compression for
+    directories.
+
+== File offsets (fileoffsets)
+
+  fileoffset (32-bits): offset to the file relative to the beginning of
+    the fileentries block.
+
+  This part is needed for making the file entries bisectable and
+    thus allowing a binary search. There are nfile + 1 offsets in the
+    fileoffset table, the last is pointing to the end of the last
+    fileentry. With this last entry, we can replace the strlen when
+    reading each filename, by calculating its length with the offsets.
+
+== File entry (fileentries)
+
+  File entries are sorted in ascending order on the name field, after the
+  respective offset given by the directory entries. All file names are
+  prefix compressed, meaning the file name is relative to the directory.
+
+  flags (16-bits): 'flags' field split into (high to low bits)
+
+    assumevalid (1-bit): assume-valid flag
+
+    intenttoadd (1-bit): intent-to-add flag, used by "git add -N".
+      Extended flag in index v3.
+
+    stage (2-bit): stage of the file during merge
+
+    skipworktree (1-bit): skip-worktree flag, used by sparse checkout.
+      Extended flag in index v3.
+
+    smudged (1-bit): indicates if the file is racily smudged.
+
+    invalid (1-bit): This bit can be set to indicate that a file was
+      deleted, but not yet removed from the index, because the index
+      was only partially rewritten.  Entries with this flags should be
+      ignored when reading the index file.
+
+    9-bit unused, must be zero [6]
+
+  mode (16-bits): file mode, split into (high to low bits)
+
+    objtype (4-bits): object type
+      valid values in binary are 1000 (regular file), 1010 (symbolic
+      link) and 1110 (gitlink)
+
+    3-bit unused
+
+    permission (9-bits): unix permission. Only 0755 and 0644 are valid
+      for regular files. Symbolic links and gitlinks have value 0 in
+      this field.
+
+  mtimes (32-bits): mtime seconds, the last time a file's data changed
+    this is stat(2) data
+
+  mtimens (32-bits): mtime nanosecond fractions
+    this is stat(2) data
+
+  file size (32-bits): The on-disk size, trucated to 32-bit.
+    this is stat(2) data
+
+  statcrc (32-bits): crc32 checksum over ctime seconds, ctime
+    nanoseconds, ino, dev, uid, gid (All stat(2) data
+    except mtime and file size). If the statcrc is 0 it will
+    be ignored. [7]
+
+  objhash (160-bits): SHA-1 for the represented object
+
+  filename (variable length, nul terminated). The exact encoding is
+    undefined, but the filename cannot contain a NUL byte (iow, the same
+    encoding as a UNIX pathname).
+
+  entrycrc (32-bits): crc32 checksum for the file entry.
+
+== Resolve undo extension
+
+  Stores resolved entries and conflicts in the index.  When a conflict
+  is resolved (e.g. with "git add path), a bit is flipped to indicate
+  the resolution.  This way conflicts can be recreated (e.g. with "git
+  checkout -m", in case users want to redo a conflict resolution from
+  scratch.
+
+  The conflicts will also be stored in the fileentries part of the index,
+  to simplify reading and writing of the index.
+
+  filename (variable length, nul terminated): filename of the entry,
+    relative to its containing directory).
+
+  nfileconflicts (32-bits): number of conflicts for the file [8]
+
+  flags (nfileconflicts entries of "flags") (16-bits): 'flags' field
+    split into:
+
+    conflicted (1-bit): conflicted state (conflicted/resolved) (1 if
+      conflicted)
+
+    stage (2-bits): stage during merge.
+
+    13-bit unused
+
+  entry_mode (nfileconflicts entries of "entry mode") (16-bits):
+    octal numbers, entry mode of eache entry in the different stages.
+    (How many is defined by the 4-byte number before)
+
+  objectnames (nfileconflicts entries of "object name") (160-bits):
+    object names  of the different stages.
+
+  conflictcrc (32-bits): crc32 checksum over conflict data.
+
+== Design explanations
+
+[1] The directory and file offsets are included in the index format
+    to enable bisectability of the index, for binary searches.Updating
+    a single entry and partial reading will benefit from this.
+
+[2] The directories are saved in their own block, to be able to
+    quickly search for a directory in the index. They include a
+    offset to the (lexically) first file in the directory.
+
+[3] The data of the cache-tree extension and the resolve undo
+    extension is now part of the index itself, but if other extensions
+    come up in the future, there is no need to change the index, they
+    can simply be added at the end.
+
+[4] To avoid rewrites of the whole index when there are conflicts or
+    conflicts are being resolved, conflicted data will be stored at
+    the end of the index. To mark the conflict resolved, just a bit
+    has to be flipped. The data will still be there, if a user wants
+    to redo the conflict resolution.
+
+[5] Since only 4 modes are effectively allowed in git but 32-bit are
+    used to store them, having a two bit flag for the mode is enough
+    and saves 4 byte per entry.
+
+[6] The length of the file name was dropped, since each file name is
+    nul terminated anyway.
+
+[7] Since all stat data (except mtime and ctime) is just used for
+    checking if a file has changed a checksum of the data is enough.
+    In addition to that Thomas Rast suggested ctime could be ditched
+    completely (core.trustctime=false) and thus included in the
+    checksum. This would save 24 bytes per index entry, which would
+    be about 4 MB on the Webkit index.
+    (Thanks for the suggestion to Michael Haggerty)
+
+[8] Since there can be more stage #1 entries, it is necessary to know
+    the number of conflict data entries there are.
+
+[9] As Michael Haggerty pointed out on the mailing list, storing the
+    trailing slash will simplify a few operations.
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (9 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Make the in-memory format aware of the stat_crc used by index-v5.
It is simply ignored by index version prior to v5.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h      |  1 +
 read-cache.c | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/cache.h b/cache.h
index 38d57e7..8c2ccc4 100644
--- a/cache.h
+++ b/cache.h
@@ -127,6 +127,7 @@ struct cache_entry {
 	unsigned int ce_flags;
 	unsigned int ce_namelen;
 	unsigned char sha1[20];
+	uint32_t ce_stat_crc;
 	struct cache_entry *next; /* used by name_hash */
 	char name[FLEX_ARRAY]; /* more */
 };
diff --git a/read-cache.c b/read-cache.c
index 7020f26..baa052c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -106,6 +106,29 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
 	return changed;
 }
 
+static uint32_t calculate_stat_crc(struct cache_entry *ce)
+{
+	unsigned int ctimens = 0;
+	uint32_t stat, stat_crc;
+
+	stat = htonl(ce->ce_stat_data.sd_ctime.sec);
+	stat_crc = crc32(0, (Bytef*)&stat, 4);
+#ifdef USE_NSEC
+	ctimens = ce->ce_stat_data.sd_ctime.nsec;
+#endif
+	stat = htonl(ctimens);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_ino);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_dev);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_uid);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_gid);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	return stat_crc;
+}
+
 /*
  * This only updates the "non-critical" parts of the directory
  * cache, ie the parts that aren't tracked by GIT, and only used
@@ -120,6 +143,8 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 
 	if (S_ISREG(st->st_mode))
 		ce_mark_uptodate(ce);
+
+	ce->ce_stat_crc = calculate_stat_crc(ce);
 }
 
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 12/24] read-cache: read index-v5
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (10 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-30  9:17   ` Duy Nguyen
                     ` (2 more replies)
  2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
                   ` (12 subsequent siblings)
  24 siblings, 3 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Make git read the index file version 5 without complaining.

This version of the reader reads neither the cache-tree
nor the resolve undo data, however, it won't choke on an
index that includes such data.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Makefile        |   1 +
 cache.h         |  32 ++++-
 read-cache-v5.c | 417 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.h    |   1 +
 4 files changed, 450 insertions(+), 1 deletion(-)
 create mode 100644 read-cache-v5.c

diff --git a/Makefile b/Makefile
index 5c28777..6a1b054 100644
--- a/Makefile
+++ b/Makefile
@@ -851,6 +851,7 @@ LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
 LIB_OBJS += read-cache-v2.o
+LIB_OBJS += read-cache-v5.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += remote.o
diff --git a/cache.h b/cache.h
index 8c2ccc4..65171e4 100644
--- a/cache.h
+++ b/cache.h
@@ -99,7 +99,7 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
 
 #define INDEX_FORMAT_LB 2
-#define INDEX_FORMAT_UB 4
+#define INDEX_FORMAT_UB 5
 
 /*
  * The "cache_time" is just the low 32 bits of the
@@ -121,6 +121,15 @@ struct stat_data {
 	unsigned int sd_size;
 };
 
+/*
+ * The *next_ce pointer is used in read_entries_v5 for holding
+ * all the elements of a directory, and points to the next
+ * cache_entry in a directory.
+ *
+ * It is reset by the add_name_hash call in set_index_entry
+ * to set it to point to the next cache_entry in the
+ * correct in-memory format ordering.
+ */
 struct cache_entry {
 	struct stat_data ce_stat_data;
 	unsigned int ce_mode;
@@ -132,11 +141,17 @@ struct cache_entry {
 	char name[FLEX_ARRAY]; /* more */
 };
 
+#define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_EXTENDED  (0x4000)
 #define CE_VALID     (0x8000)
+#define CE_SMUDGED   (0x0400) /* index v5 only flag */
 #define CE_STAGESHIFT 12
 
+#define CONFLICT_CONFLICTED (0x8000)
+#define CONFLICT_STAGESHIFT 13
+#define CONFLICT_STAGEMASK (0x6000)
+
 /*
  * Range 0xFFFF0000 in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -173,6 +188,19 @@ struct cache_entry {
 #define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE)
 
 /*
+ * Representation of the extended on-disk flags in the v5 format.
+ * They must not collide with the ordinary on-disk flags, and need to
+ * fit in 16 bits.  Note however that v5 does not save the name
+ * length.
+ */
+#define CE_INTENT_TO_ADD_V5  (0x4000)
+#define CE_SKIP_WORKTREE_V5  (0x0800)
+#define CE_INVALID_V5        (0x0200)
+#if (CE_VALID|CE_STAGEMASK) & (CE_INTENTTOADD_V5|CE_SKIPWORKTREE_V5|CE_INVALID_V5)
+#error "v5 on-disk flags collide with ordinary on-disk flags"
+#endif
+
+/*
  * Safeguard to avoid saving wrong flags:
  *  - CE_EXTENDED2 won't get saved until its semantic is known
  *  - Bits in 0x0000FFFF have been saved in ce_flags already
@@ -213,6 +241,8 @@ static inline unsigned create_ce_flags(unsigned stage)
 #define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
 #define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
 
+#define conflict_stage(c) ((CONFLICT_STAGEMASK & (c)->flags) >> CONFLICT_STAGESHIFT)
+
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
 static inline unsigned int create_ce_mode(unsigned int mode)
 {
diff --git a/read-cache-v5.c b/read-cache-v5.c
new file mode 100644
index 0000000..9d8c8f0
--- /dev/null
+++ b/read-cache-v5.c
@@ -0,0 +1,417 @@
+#include "cache.h"
+#include "read-cache.h"
+#include "resolve-undo.h"
+#include "cache-tree.h"
+#include "dir.h"
+#include "pathspec.h"
+
+#define ptr_add(x,y) ((void *)(((char *)(x)) + (y)))
+
+struct cache_header_v5 {
+	uint32_t hdr_ndir;
+	uint32_t hdr_fblockoffset;
+	uint32_t hdr_nextension;
+};
+
+struct directory_entry {
+	struct directory_entry **sub;
+	struct directory_entry *next;
+	struct directory_entry *next_hash;
+	struct cache_entry *ce;
+	struct cache_entry *ce_last;
+	uint32_t conflict_size;
+	uint32_t de_foffset;
+	uint32_t de_nsubtrees;
+	uint32_t de_nfiles;
+	uint32_t de_nentries;
+	unsigned char sha1[20];
+	uint16_t de_flags;
+	uint32_t de_pathlen;
+	char pathname[FLEX_ARRAY];
+};
+
+struct conflict_part {
+	struct conflict_part *next;
+	uint16_t flags;
+	uint16_t entry_mode;
+	unsigned char sha1[20];
+};
+
+struct conflict_entry {
+	struct conflict_entry *next;
+	uint32_t nfileconflicts;
+	struct conflict_part *entries;
+	uint32_t namelen;
+	uint32_t pathlen;
+	char name[FLEX_ARRAY];
+};
+
+/*****************************************************************
+ * Index File I/O
+ *****************************************************************/
+
+struct ondisk_cache_entry {
+	uint16_t flags;
+	uint16_t mode;
+	struct cache_time mtime;
+	uint32_t size;
+	uint32_t stat_crc;
+	unsigned char sha1[20];
+	char name[FLEX_ARRAY];
+};
+
+struct ondisk_directory_entry {
+	uint32_t foffset;
+	uint32_t nsubtrees;
+	uint32_t nfiles;
+	uint32_t nentries;
+	unsigned char sha1[20];
+	uint32_t flags;
+	char name[FLEX_ARRAY];
+};
+#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) + (len) + 1)
+#define conflict_entry_size(len) (offsetof(struct conflict_entry,name) + (len) + 1)
+
+static int check_crc32(int initialcrc, void *data,
+		       size_t len, unsigned int expected_crc)
+{
+	int crc;
+
+	crc = crc32(initialcrc, (Bytef*)data, len);
+	return crc == expected_crc;
+}
+
+static int match_stat_crc(struct stat *st, uint32_t expected_crc)
+{
+	uint32_t data, stat_crc = 0;
+	unsigned int ctimens = 0;
+
+	data = htonl(st->st_ctime);
+	stat_crc = crc32(0, (Bytef*)&data, 4);
+#ifdef USE_NSEC
+	ctimens = ST_CTIME_NSEC(*st);
+#endif
+	data = htonl(ctimens);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_ino);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_dev);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_uid);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_gid);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+
+	return stat_crc == expected_crc;
+}
+
+static int match_stat_basic(const struct cache_entry *ce,
+			    struct stat *st,
+			    int changed)
+{
+
+	if (ce->ce_stat_data.sd_mtime.sec != (unsigned int)st->st_mtime)
+		changed |= MTIME_CHANGED;
+#ifdef USE_NSEC
+	if (ce->ce_stat_data.sd_mtime.nsec != ST_MTIME_NSEC(*st))
+		changed |= MTIME_CHANGED;
+#endif
+	if (ce->ce_stat_data.sd_size != (unsigned int)st->st_size)
+		changed |= DATA_CHANGED;
+
+	if (trust_ctime && ce->ce_stat_crc != 0 && !match_stat_crc(st, ce->ce_stat_crc)) {
+		changed |= OWNER_CHANGED;
+		changed |= INODE_CHANGED;
+	}
+	/* Racily smudged entry? */
+	if (ce->ce_flags & CE_SMUDGED) {
+		if (!changed && !is_empty_blob_sha1(ce->sha1) && ce_modified_check_fs(ce, st))
+			changed |= DATA_CHANGED;
+	}
+	return changed;
+}
+
+static int verify_hdr(void *mmap, unsigned long size)
+{
+	uint32_t *filecrc;
+	unsigned int header_size;
+	struct cache_header *hdr;
+	struct cache_header_v5 *hdr_v5;
+
+	if (size < sizeof(struct cache_header)
+	    + sizeof (struct cache_header_v5) + 4)
+		die("index file smaller than expected");
+
+	hdr = mmap;
+	hdr_v5 = ptr_add(mmap, sizeof(*hdr));
+	/* Size of the header + the size of the extensionoffsets */
+	header_size = sizeof(*hdr) + sizeof(*hdr_v5) + hdr_v5->hdr_nextension * 4;
+	/* Initialize crc */
+	filecrc = ptr_add(mmap, header_size);
+	if (!check_crc32(0, hdr, header_size, ntohl(*filecrc)))
+		return error("bad index file header crc signature");
+	return 0;
+}
+
+static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
+						   char *pathname, size_t len,
+						   size_t pathlen)
+{
+	struct cache_entry *ce = xmalloc(cache_entry_size(len + pathlen));
+	int flags;
+
+	memset(ce, 0, cache_entry_size(len + pathlen));
+	flags = ntoh_s(ondisk->flags);
+	/*
+	 * This entry was invalidated in the index file,
+	 * we don't need any data from it
+	 */
+	if (flags & CE_INVALID_V5)
+		return NULL;
+	ce->ce_stat_data.sd_mtime.sec  = ntoh_l(ondisk->mtime.sec);
+	ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
+	ce->ce_stat_data.sd_size       = ntoh_l(ondisk->size);
+	ce->ce_mode       = ntoh_s(ondisk->mode);
+	ce->ce_flags      = flags & CE_STAGEMASK;
+	ce->ce_flags     |= flags & CE_VALID;
+	ce->ce_flags     |= flags & CE_SMUDGED;
+	if (flags & CE_INTENT_TO_ADD_V5)
+		ce->ce_flags |= CE_INTENT_TO_ADD;
+	if (flags & CE_SKIP_WORKTREE_V5)
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_stat_crc   = ntoh_l(ondisk->stat_crc);
+	ce->ce_namelen    = len + pathlen;
+	hashcpy(ce->sha1, ondisk->sha1);
+	memcpy(ce->name, pathname, pathlen);
+	memcpy(ce->name + pathlen, ondisk->name, len);
+	ce->name[len + pathlen] = '\0';
+	return ce;
+}
+
+static struct directory_entry *init_directory_entry(const char *pathname, int len)
+{
+	struct directory_entry *de = xmalloc(directory_entry_size(len));
+
+	memset(de, 0, directory_entry_size(len));
+	memcpy(de->pathname, pathname, len);
+	de->de_pathlen = len;
+	return de;
+}
+
+static struct directory_entry *directory_entry_from_ondisk(struct ondisk_directory_entry *ondisk,
+							   size_t len)
+{
+	struct directory_entry *de = init_directory_entry(ondisk->name, len);
+
+	de->de_flags      = ntoh_s(ondisk->flags);
+	de->de_foffset    = ntoh_l(ondisk->foffset);
+	de->de_nsubtrees  = ntoh_l(ondisk->nsubtrees);
+	de->de_nfiles     = ntoh_l(ondisk->nfiles);
+	de->de_nentries   = ntoh_l(ondisk->nentries);
+	de->de_pathlen    = len;
+	hashcpy(de->sha1, ondisk->sha1);
+	return de;
+}
+
+/*
+ * Read the directories recursively into a directory tree.  dir_offset
+ * is the current offset to the directory to be read in the direntries
+ * block, while dir_table_offset is the current offset for the directory
+ * in the diroffsets block.
+ */
+static struct directory_entry *read_directories(unsigned int *dir_offset,
+						unsigned int *dir_table_offset,
+						void *mmap, int mmap_size)
+{
+	uint32_t *filecrc, *beginning, *end;
+	struct ondisk_directory_entry *disk_de;
+	struct directory_entry *de;
+	unsigned int data_len, len, i;
+
+	beginning = ptr_add(mmap, *dir_table_offset);
+	end = ptr_add(mmap, *dir_table_offset + 4);
+	/* Calculate the namelen from the offsets (-5 = NUL byte + crc checksum) */
+	len = ntoh_l(*end) - ntoh_l(*beginning) -
+		offsetof(struct ondisk_directory_entry, name) - 5;
+	disk_de = ptr_add(mmap, *dir_offset);
+	de = directory_entry_from_ondisk(disk_de, len);
+
+	data_len = len + 1 + offsetof(struct ondisk_directory_entry, name);
+	filecrc = ptr_add(mmap, *dir_offset + data_len);
+	if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
+		die("directory crc doesn't match for '%s'", de->pathname);
+
+	*dir_table_offset += 4;
+	*dir_offset += data_len + 4; /* crc code */
+
+	de->sub = xcalloc(de->de_nsubtrees, sizeof(struct directory_entry *));
+	for (i = 0; i < de->de_nsubtrees; i++) {
+		de->sub[i] = read_directories(dir_offset, dir_table_offset,
+						   mmap, mmap_size);
+	}
+
+	return de;
+}
+
+static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
+		      void *mmap, unsigned long mmap_size,
+		      unsigned int first_entry_offset,
+		      unsigned int foffsetblock)
+{
+	int len;
+	uint32_t *filecrc, *beginning, *end, entry_offset;
+	struct ondisk_cache_entry *disk_ce;
+
+	beginning = ptr_add(mmap, foffsetblock);
+	end = ptr_add(mmap, foffsetblock + 4);
+	/* Calculate the namelen from the offsets (-5 = NUL byte + crc checksum) */
+	len = ntoh_l(*end) - ntoh_l(*beginning) -
+		offsetof(struct ondisk_cache_entry, name) - 5;
+	entry_offset = first_entry_offset + ntoh_l(*beginning);
+	disk_ce = ptr_add(mmap, entry_offset);
+	*ce = cache_entry_from_ondisk(disk_ce, pathname, len, pathlen);
+	filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
+	if (!check_crc32(0,
+		ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
+		ntoh_l(*filecrc)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * Read all file entries from the index.  This function is recursive to get
+ * the ordering right. In the index file the entries are sorted def, abc/def,
+ * abc/xyz, while in-core they are sorted abc/def, abc/xyz, def.
+ */
+static int read_entries(struct index_state *istate, struct directory_entry *de,
+			unsigned int first_entry_offset, void *mmap,
+			unsigned long mmap_size, unsigned int *nr,
+			unsigned int foffsetblock)
+{
+	struct cache_entry *ce;
+	int i, subdir = 0;
+
+	for (i = 0; i < de->de_nfiles; i++) {
+		unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock + (i * 4);
+		if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, mmap_size,
+			       first_entry_offset, subdir_foffsetblock) < 0)
+			return -1;
+		while (subdir < de->de_nsubtrees &&
+		       cache_name_compare(ce->name + de->de_pathlen,
+					  ce_namelen(ce) - de->de_pathlen,
+					  de->sub[subdir]->pathname + de->de_pathlen,
+					  de->sub[subdir]->de_pathlen - de->de_pathlen) > 0) {
+			read_entries(istate, de->sub[subdir], first_entry_offset, mmap,
+				     mmap_size, nr, foffsetblock);
+			subdir++;
+		}
+		if (!ce)
+			continue;
+		set_index_entry(istate, (*nr)++, ce);
+	}
+	for (i = subdir; i < de->de_nsubtrees; i++) {
+		read_entries(istate, de->sub[i], first_entry_offset, mmap,
+			     mmap_size, nr, foffsetblock);
+	}
+	return 0;
+}
+
+static void free_directory_tree(struct directory_entry *de) {
+	int i;
+
+	for (i = 0; i < de->de_pathlen; i++)
+		free_directory_tree(de->sub[i]);
+	free(de);
+}
+
+/*
+ * Read an index-v5 file filtered by the filter_opts.   If opts is NULL,
+ * everything will be read.
+ */
+static int read_index_v5(struct index_state *istate, void *mmap,
+			 unsigned long mmap_size, struct filter_opts *opts)
+{
+	unsigned int entry_offset, foffsetblock, nr = 0, *extoffsets;
+	unsigned int dir_offset, dir_table_offset;
+	int need_root = 0, i;
+	uint32_t *offset;
+	struct directory_entry *root_directory, *de, *last_de;
+	const char **paths = NULL;
+	struct pathspec adjusted_pathspec;
+	struct cache_header *hdr;
+	struct cache_header_v5 *hdr_v5;
+
+	hdr = mmap;
+	hdr_v5 = ptr_add(mmap, sizeof(*hdr));
+	istate->cache_alloc = alloc_nr(ntohl(hdr->hdr_entries));
+	istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
+	extoffsets = xcalloc(ntohl(hdr_v5->hdr_nextension), sizeof(int));
+	for (i = 0; i < ntohl(hdr_v5->hdr_nextension); i++) {
+		offset = ptr_add(mmap, sizeof(*hdr) + sizeof(*hdr_v5));
+		extoffsets[i] = htonl(*offset);
+	}
+
+	/* Skip size of the header + crc sum + size of offsets to extensions + size of offsets */
+	dir_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4
+		+ (ntohl(hdr_v5->hdr_ndir) + 1) * 4;
+	dir_table_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4;
+	root_directory = read_directories(&dir_offset, &dir_table_offset,
+					  mmap, mmap_size);
+
+	entry_offset = ntohl(hdr_v5->hdr_fblockoffset);
+	foffsetblock = dir_offset;
+
+	if (opts && opts->pathspec && opts->pathspec->nr) {
+		paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
+		paths[opts->pathspec->nr] = NULL;
+		for (i = 0; i < opts->pathspec->nr; i++) {
+			char *super = strdup(opts->pathspec->items[i].match);
+			int len = strlen(super);
+			while (len && super[len - 1] == '/' && super[len - 2] == '/')
+				super[--len] = '\0'; /* strip all but one trailing slash */
+			while (len && super[--len] != '/')
+				; /* scan backwards to next / */
+			if (len >= 0)
+				super[len--] = '\0';
+			if (len <= 0) {
+				need_root = 1;
+				break;
+			}
+			paths[i] = super;
+		}
+	}
+
+	if (!need_root)
+		parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
+
+	de = root_directory;
+	last_de = de;
+	while (de) {
+		if (need_root ||
+		    match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
+			if (read_entries(istate, de, entry_offset,
+					 mmap, mmap_size, &nr,
+					 foffsetblock) < 0)
+				return -1;
+		} else {
+			last_de = de;
+			for (i = 0; i < de->de_nsubtrees; i++) {
+				de->sub[i]->next = last_de->next;
+				last_de->next = de->sub[i];
+				last_de = last_de->next;
+			}
+		}
+		de = de->next;
+	}
+	free_directory_tree(root_directory);
+	istate->cache_nr = nr;
+	return 0;
+}
+
+struct index_ops v5_ops = {
+	match_stat_basic,
+	verify_hdr,
+	read_index_v5,
+	NULL
+};
diff --git a/read-cache.h b/read-cache.h
index f920546..7823fbb 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -34,6 +34,7 @@ struct index_ops {
 };
 
 extern struct index_ops v2_ops;
+extern struct index_ops v5_ops;
 
 #ifndef NEEDS_ALIGNED_ACCESS
 #define ntoh_s(var) ntohs(var)
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 13/24] read-cache: read resolve-undo data
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (11 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Make git read the resolve-undo data from the index.

Since the resolve-undo data is joined with the conflicts in
the ondisk format of the index file version 5, conflicts and
resolved data is read at the same time, and the resolve-undo
data are then converted to the in-memory format.

Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache-v5.c | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 157 insertions(+), 3 deletions(-)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 9d8c8f0..a9c687f 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "read-cache.h"
+#include "string-list.h"
 #include "resolve-undo.h"
 #include "cache-tree.h"
 #include "dir.h"
@@ -13,13 +14,18 @@ struct cache_header_v5 {
 	uint32_t hdr_nextension;
 };
 
+struct extension_header {
+	char signature[4];
+	uint32_t size;
+	uint32_t crc;
+};
+
 struct directory_entry {
 	struct directory_entry **sub;
 	struct directory_entry *next;
 	struct directory_entry *next_hash;
 	struct cache_entry *ce;
 	struct cache_entry *ce_last;
-	uint32_t conflict_size;
 	uint32_t de_foffset;
 	uint32_t de_nsubtrees;
 	uint32_t de_nfiles;
@@ -42,7 +48,6 @@ struct conflict_entry {
 	uint32_t nfileconflicts;
 	struct conflict_part *entries;
 	uint32_t namelen;
-	uint32_t pathlen;
 	char name[FLEX_ARRAY];
 };
 
@@ -50,6 +55,12 @@ struct conflict_entry {
  * Index File I/O
  *****************************************************************/
 
+struct ondisk_conflict_part {
+	uint16_t flags;
+	uint16_t entry_mode;
+	unsigned char sha1[20];
+};
+
 struct ondisk_cache_entry {
 	uint16_t flags;
 	uint16_t mode;
@@ -145,7 +156,7 @@ static int verify_hdr(void *mmap, unsigned long size)
 	hdr = mmap;
 	hdr_v5 = ptr_add(mmap, sizeof(*hdr));
 	/* Size of the header + the size of the extensionoffsets */
-	header_size = sizeof(*hdr) + sizeof(*hdr_v5) + hdr_v5->hdr_nextension * 4;
+	header_size = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4;
 	/* Initialize crc */
 	filecrc = ptr_add(mmap, header_size);
 	if (!check_crc32(0, hdr, header_size, ntohl(*filecrc)))
@@ -279,6 +290,134 @@ static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
 	return 0;
 }
 
+static struct conflict_part *conflict_part_from_ondisk(struct ondisk_conflict_part *ondisk)
+{
+	struct conflict_part *cp = xmalloc(sizeof(struct conflict_part));
+
+	cp->flags      = ntoh_s(ondisk->flags);
+	cp->entry_mode = ntoh_s(ondisk->entry_mode);
+	hashcpy(cp->sha1, ondisk->sha1);
+	return cp;
+}
+
+struct conflict_entry *create_new_conflict(char *name, int len)
+{
+	struct conflict_entry *conflict_entry;
+
+	conflict_entry = xmalloc(conflict_entry_size(len));
+	memset(conflict_entry, 0, conflict_entry_size(len));
+	conflict_entry->namelen = len;
+	memcpy(conflict_entry->name, name, len);
+
+	return conflict_entry;
+}
+
+static void add_part_to_conflict_entry(struct conflict_entry *entry,
+				       struct conflict_part *conflict_part)
+{
+
+	struct conflict_part *conflict_search;
+
+	entry->nfileconflicts++;
+	if (!entry->entries)
+		entry->entries = conflict_part;
+	else {
+		conflict_search = entry->entries;
+		while (conflict_search->next)
+			conflict_search = conflict_search->next;
+		conflict_search->next = conflict_part;
+	}
+}
+
+/*
+ * Read the resolve undo data on disk and convert it to the internal
+ * resolve undo format.
+ */
+static int read_resolve_undo(struct index_state *istate,
+			     unsigned int offset, void *mmap,
+			     unsigned int entries)
+{
+	int i, k;
+
+	for (i = 0; i < entries; i++) {
+		char *name;
+		unsigned int len, *nfileconflicts, nc;
+		uint32_t *crc;
+		struct ondisk_conflict_part *ondisk;
+		struct conflict_part *cp;
+		struct string_list_item *lost;
+		struct resolve_undo_info *ui;
+
+		name = ptr_add(mmap, offset);
+		len = strlen(name);
+		offset += len + 1;
+		nfileconflicts = ptr_add(mmap, offset);
+		nc = ntoh_l(*nfileconflicts);
+		offset += 4;
+
+		crc = ptr_add(mmap, offset +
+			      nc * sizeof(struct ondisk_conflict_part));
+		if (!check_crc32(0, name, len + 1 + 4 +
+				 nc * sizeof(struct ondisk_conflict_part),
+				 ntoh_l(*crc)))
+			return -1;
+
+		ondisk = ptr_add(mmap, offset);
+		cp = conflict_part_from_ondisk(ondisk);
+		if (cp->flags & CONFLICT_CONFLICTED) {
+			offset += nc * sizeof(struct ondisk_conflict_part) + 4;
+			continue;
+		}
+		offset += sizeof(struct ondisk_conflict_part);
+		if (!istate->resolve_undo) {
+			istate->resolve_undo = xcalloc(1, sizeof(struct string_list));
+			istate->resolve_undo->strdup_strings = 1;
+		}
+
+		lost = string_list_insert(istate->resolve_undo, name);
+		if (!lost->util)
+			lost->util = xcalloc(1, sizeof(*ui));
+		ui = lost->util;
+		for (k = 0; k < 3; k++)
+			ui->mode[k] = 0;
+
+		ui->mode[conflict_stage(cp) - 1] = cp->entry_mode;
+		hashcpy(ui->sha1[conflict_stage(cp) - 1], cp->sha1);
+		for (k = 1; k < nc; k++) {
+			struct conflict_part *cp;
+
+			ondisk = ptr_add(mmap, offset);
+			cp = conflict_part_from_ondisk(ondisk);
+			ui->mode[conflict_stage(cp) - 1] = cp->entry_mode;
+			hashcpy(ui->sha1[conflict_stage(cp) - 1], cp->sha1);
+			offset += sizeof(struct ondisk_conflict_part);
+		}
+		offset += 4; /* crc */
+	}
+	return 0;
+}
+
+static int read_index_extension(struct index_state *istate,
+				void *mmap, unsigned int extoffset)
+{
+	struct extension_header *ehdr;
+
+	ehdr = ptr_add(mmap, extoffset);
+	/* -4 for the crc that's included in the struct */
+	if (!check_crc32(0, ptr_add(mmap, extoffset),
+			 sizeof(*ehdr) - 4, ntoh_l(ehdr->crc)))
+		return -1;
+
+	switch (CACHE_EXT(ehdr->signature)) {
+	case CACHE_EXT_RESOLVE_UNDO:
+		if (read_resolve_undo(istate, extoffset + sizeof(*ehdr),
+				      mmap, ntoh_l(ehdr->size)) < 0)
+			return -1;
+		break;
+	}
+	return 0;
+}
+
 /*
  * Read all file entries from the index.  This function is recursive to get
  * the ordering right. In the index file the entries are sorted def, abc/def,
@@ -404,6 +543,21 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 		}
 		de = de->next;
 	}
+
+	if (!opts || opts->read_resolve_undo) {
+		for (i = 0; i < ntohl(hdr_v5->hdr_nextension); i++) {
+			/*
+			 * After the index entry there is a number of
+			 * extensions, which is written in the header.
+			 * The extensions are prefixed by extension name
+			 * (4-byte) and length of the extension (4-byte,
+			 * usually the number of entries in that section)
+			 * in network byte order
+			 */
+			if (read_index_extension(istate, mmap, extoffsets[i]) < 0)
+				return -1;
+		}
+	}
 	free_directory_tree(root_directory);
 	istate->cache_nr = nr;
 	return 0;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 14/24] read-cache: read cache-tree in index-v5
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (12 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Since the cache-tree data is saved as part of the directory data,
we already read it at the beginning of the index. The cache-tree
is only converted from this directory data.

The cache-tree data is arranged in a tree, with the children sorted by
pathlen at each node, while the ondisk format is sorted lexically.
So we have to rebuild this format from the on-disk directory list.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache-tree.c    |  2 +-
 cache-tree.h    |  1 +
 read-cache-v5.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 0bbec43..1209732 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -31,7 +31,7 @@ void cache_tree_free(struct cache_tree **it_p)
 	*it_p = NULL;
 }
 
-static int subtree_name_cmp(const char *one, int onelen,
+int subtree_name_cmp(const char *one, int onelen,
 			    const char *two, int twolen)
 {
 	if (onelen < twolen)
diff --git a/cache-tree.h b/cache-tree.h
index f1923ad..9818926 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -25,6 +25,7 @@ struct cache_tree *cache_tree(void);
 void cache_tree_free(struct cache_tree **);
 void cache_tree_invalidate_path(struct cache_tree *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
+int subtree_name_cmp(const char *, int, const char *, int);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
diff --git a/read-cache-v5.c b/read-cache-v5.c
index a9c687f..01f1c88 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -418,6 +418,73 @@ static int read_index_extension(struct index_state *istate,
 	return 0;
 }
 
+static int compare_cache_tree(const void *a, const void *b)
+{
+	const struct cache_tree_sub *it1, *it2;
+
+	it1 = *(const struct cache_tree_sub **) a;
+	it2 = *(const struct cache_tree_sub **) b;
+	return subtree_name_cmp(it1->name, it1->namelen,
+				it2->name, it2->namelen);
+}
+
+/*
+ * Convert the directory entries to cache-tree entries
+ * recursively.
+ */
+static struct cache_tree *convert_one(struct directory_entry *de)
+{
+	int i;
+	struct cache_tree *it;
+
+	it = cache_tree();
+	it->entry_count = de->de_nentries;
+	if (0 <= it->entry_count)
+		hashcpy(it->sha1, de->sha1);
+
+	/*
+	 * Just a heuristic -- we do not add directories that often but
+	 * we do not want to have to extend it immediately when we do,
+	 * hence +2.
+	 */
+	it->subtree_alloc = de->de_nsubtrees + 2;
+	it->down = xcalloc(it->subtree_alloc, sizeof(struct cache_tree_sub *));
+	for (i = 0; i < de->de_nsubtrees; i++) {
+		struct cache_tree *sub = convert_one(de->sub[i]);
+		struct cache_tree_sub *subtree;
+		/* -1 for removing the / at the end of the pathname */
+		int namelen = de->sub[i]->de_pathlen - de->de_pathlen - 1;
+
+		if (!sub)
+			goto free_return;
+
+		subtree = xmalloc(sizeof(*subtree) + namelen + 1);
+		subtree->cache_tree = sub;
+		subtree->namelen = namelen;
+		memcpy(subtree->name, de->sub[i]->pathname + de->de_pathlen, namelen);
+		subtree->name[namelen] = '\0';
+		it->down[i] = subtree;
+		it->subtree_nr++;
+	}
+	qsort(it->down, it->subtree_nr, sizeof(struct cache_tree_sub *),
+	      compare_cache_tree);
+	return it;
+free_return:
+	cache_tree_free(&it);
+	return NULL;
+}
+
+/*
+ * This function modifies the directory argument that is given to it.
+ * Don't use it if the directory entries are still needed after.
+ */
+static struct cache_tree *cache_tree_convert_v5(struct directory_entry *de)
+{
+	if (!de->de_nentries)
+		return NULL;
+	return convert_one(de);
+}
+
 /*
  * Read all file entries from the index.  This function is recursive to get
  * the ordering right. In the index file the entries are sorted def, abc/def,
@@ -558,6 +625,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 				return -1;
 		}
 	}
+	istate->cache_tree = cache_tree_convert_v5(root_directory);
 	free_directory_tree(root_directory);
 	istate->cache_nr = nr;
 	return 0;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 15/24] read-cache: write index-v5
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (13 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Write the index version 5 file format to disk. This version doesn't
write the cache-tree data and resolve-undo data to the file.

The main work is done when filtering out the directories from the
current in-memory format, where in the same turn also the conflicts
and the file data is calculated.

Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h         |   1 +
 read-cache-v5.c | 431 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 read-cache.c    |   4 +-
 read-cache.h    |   1 +
 4 files changed, 435 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 65171e4..71b98cf 100644
--- a/cache.h
+++ b/cache.h
@@ -138,6 +138,7 @@ struct cache_entry {
 	unsigned char sha1[20];
 	uint32_t ce_stat_crc;
 	struct cache_entry *next; /* used by name_hash */
+	struct cache_entry *next_ce;
 	char name[FLEX_ARRAY]; /* more */
 };
 
diff --git a/read-cache-v5.c b/read-cache-v5.c
index 01f1c88..797022f 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -631,9 +631,438 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 	return 0;
 }
 
+#define WRITE_BUFFER_SIZE 8192
+static unsigned char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned long write_buffer_len;
+
+static int ce_write_flush(int fd)
+{
+	unsigned int buffered = write_buffer_len;
+	if (buffered) {
+		if (write_in_full(fd, write_buffer, buffered) != buffered)
+			return -1;
+		write_buffer_len = 0;
+	}
+	return 0;
+}
+
+static int ce_write(uint32_t *crc, int fd, void *data, unsigned int len)
+{
+	if (crc)
+		*crc = crc32(*crc, (Bytef*)data, len);
+	while (len) {
+		unsigned int buffered = write_buffer_len;
+		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
+		if (partial > len)
+			partial = len;
+		memcpy(write_buffer + buffered, data, partial);
+		buffered += partial;
+		if (buffered == WRITE_BUFFER_SIZE) {
+			write_buffer_len = buffered;
+			if (ce_write_flush(fd))
+				return -1;
+			buffered = 0;
+		}
+		write_buffer_len = buffered;
+		len -= partial;
+		data = (char *) data + partial;
+	}
+	return 0;
+}
+
+static int ce_flush(int fd)
+{
+	unsigned int left = write_buffer_len;
+
+	if (left)
+		write_buffer_len = 0;
+
+	if (write_in_full(fd, write_buffer, left) != left)
+		return -1;
+
+	return 0;
+}
+
+static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
+{
+	/*
+	 * This method shall only be called if the timestamp of ce
+	 * is racy (check with is_racy_timestamp). If the timestamp
+	 * is racy, the writer will set the CE_SMUDGED flag.
+	 *
+	 * The reader (match_stat_basic) will then take care
+	 * of checking if the entry is really changed or not, by
+	 * taking into account the size and the stat_crc and if
+	 * that hasn't changed checking the sha1.
+	 */
+	ce->ce_flags |= CE_SMUDGED;
+}
+
+static char *super_directory(char *filename)
+{
+	char *super = dirname(filename);
+	if (!strcmp(super, "."))
+		return NULL;
+	return super;
+}
+
+static void ondisk_from_directory_entry(struct directory_entry *de,
+					struct ondisk_directory_entry *ondisk)
+{
+	ondisk->foffset   = htonl(de->de_foffset);
+	ondisk->nsubtrees = htonl(de->de_nsubtrees);
+	ondisk->nfiles    = htonl(de->de_nfiles);
+	ondisk->nentries  = htonl(de->de_nentries);
+	hashcpy(ondisk->sha1, de->sha1);
+	ondisk->flags     = htons(de->de_flags);
+	if (de->de_pathlen == 0) {
+		memcpy(ondisk->name, "\0", 1);
+	} else {
+		memcpy(ondisk->name, de->pathname, de->de_pathlen);
+		memcpy(ondisk->name + de->de_pathlen, "/\0", 2);
+	}
+}
+
+static void insert_directory_entry(struct directory_entry *de,
+				   struct hash_table *table,
+				   unsigned int *total_dir_len,
+				   unsigned int *ndir,
+				   uint32_t crc)
+{
+	struct directory_entry *insert;
+
+	insert = (struct directory_entry *)insert_hash(crc, de, table);
+	if (insert) {
+		de->next_hash = insert->next_hash;
+		insert->next_hash = de;
+	}
+	(*ndir)++;
+	if (de->de_pathlen == 0)
+		(*total_dir_len)++;
+	else
+		*total_dir_len += de->de_pathlen + 2;
+}
+
+static struct directory_entry *find_directory(char *dir, int dir_len, uint32_t *crc,
+					      struct hash_table *table)
+{
+	struct directory_entry *search;
+
+	*crc = crc32(0, (Bytef*)dir, dir_len);
+	search = lookup_hash(*crc, table);
+	while (search &&
+	       cache_name_compare(dir, dir_len, search->pathname, search->de_pathlen))
+		search = search->next_hash;
+	return search;
+}
+
+static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
+					     struct hash_table *table,
+					     unsigned int *total_dir_len,
+					     unsigned int *ndir,
+					     struct directory_entry **current)
+{
+	struct directory_entry *tmp = NULL, *search, *new, *ret;
+	uint32_t crc;
+
+	search = find_directory(dir, dir_len, &crc, table);
+	if (search)
+		return search;
+	while (!search) {
+		new = init_directory_entry(dir, dir_len);
+		insert_directory_entry(new, table, total_dir_len, ndir, crc);
+		if (!tmp)
+			ret = new;
+		else
+			new->de_nsubtrees = 1;
+		new->next = tmp;
+		tmp = new;
+		dir = super_directory(dir);
+		dir_len = dir ? strlen(dir) : 0;
+		search = find_directory(dir, dir_len, &crc, table);
+	}
+	search->de_nsubtrees++;
+	(*current)->next = tmp;
+	while ((*current)->next)
+		*current = (*current)->next;
+
+	return ret;
+}
+
+static void ce_queue_push(struct cache_entry **head,
+			  struct cache_entry **tail,
+			  struct cache_entry *ce)
+{
+	if (!*head) {
+		*head = *tail = ce;
+		(*tail)->next_ce = NULL;
+		return;
+	}
+
+	(*tail)->next_ce = ce;
+	ce->next_ce = NULL;
+	*tail = (*tail)->next_ce;
+}
+
+static struct directory_entry *compile_directory_data(struct index_state *istate,
+						      int nfile, unsigned int *ndir,
+						      unsigned int *total_dir_len,
+						      unsigned int *total_file_len)
+{
+	int i, dir_len = -1;
+	char *dir;
+	struct directory_entry *de, *current, *search;
+	struct cache_entry **cache = istate->cache;
+	struct hash_table table;
+	uint32_t crc;
+
+	init_hash(&table);
+	de = init_directory_entry("", 0);
+	current = de;
+	*ndir = 1;
+	*total_dir_len = 1;
+	crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
+	insert_hash(crc, de, &table);
+	for (i = 0; i < nfile; i++) {
+		if (cache[i]->ce_flags & CE_REMOVE)
+			continue;
+
+		if (dir_len < 0
+		    || !(!(dir_len < ce_namelen(cache[i]) && cache[i]->name[dir_len] != '/')
+			 && !strchr(cache[i]->name + dir_len + 1, '/')
+			 && !cache_name_compare(cache[i]->name, ce_namelen(cache[i]),
+						dir, dir_len))) {
+			dir = super_directory(strdup(cache[i]->name));
+			dir_len = dir ? strlen(dir) : 0;
+			search = get_directory(dir, dir_len, &table,
+					       total_dir_len, ndir,
+					       &current);
+		}
+		search->de_nfiles++;
+		*total_file_len += ce_namelen(cache[i]) + 1;
+		if (search->de_pathlen)
+			*total_file_len -= search->de_pathlen + 1;
+		ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
+	}
+	return de;
+}
+
+static void ondisk_from_cache_entry(struct cache_entry *ce,
+				    struct ondisk_cache_entry *ondisk,
+				    int pathlen)
+{
+	unsigned int flags;
+
+	flags  = ce->ce_flags & CE_STAGEMASK;
+	flags |= ce->ce_flags & CE_VALID;
+	flags |= ce->ce_flags & CE_SMUDGED;
+	if (ce->ce_flags & CE_INTENT_TO_ADD)
+		flags |= CE_INTENT_TO_ADD_V5;
+	if (ce->ce_flags & CE_SKIP_WORKTREE)
+		flags |= CE_SKIP_WORKTREE_V5;
+	ondisk->flags      = htons(flags);
+	ondisk->mode       = htons(ce->ce_mode);
+	ondisk->mtime.sec  = htonl(ce->ce_stat_data.sd_mtime.sec);
+#ifdef USE_NSEC
+	ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);
+#else
+	ondisk->mtime.nsec = 0;
+#endif
+	ondisk->size       = htonl(ce->ce_stat_data.sd_size);
+	if (!ce->ce_stat_crc)
+		ce->ce_stat_crc = calculate_stat_crc(ce);
+	ondisk->stat_crc   = htonl(ce->ce_stat_crc);
+	hashcpy(ondisk->sha1, ce->sha1);
+	memcpy(ondisk->name, ce->name + pathlen, ce_namelen(ce) - pathlen);
+	ondisk->name[ce_namelen(ce) - pathlen] = '\0';
+}
+
+static int write_directories(struct directory_entry *de, int fd)
+{
+	struct directory_entry *current;
+	struct ondisk_directory_entry *ondisk;
+	int current_offset, offset_write, ondisk_size, foffset;
+	uint32_t crc;
+
+	ondisk_size = offsetof(struct ondisk_directory_entry, name);
+	current = de;
+	current_offset = 0;
+	foffset = 0;
+	/* Write directory offsets */
+	while (current) {
+		int pathlen;
+
+		offset_write = htonl(current_offset);
+		if (ce_write(NULL, fd, &offset_write, 4) < 0)
+			return -1;
+		if (current->de_pathlen == 0)
+			pathlen = 0;
+		else
+			pathlen = current->de_pathlen + 1;
+		current_offset += pathlen + 1 + ondisk_size + 4;
+		current = current->next;
+	}
+	/*
+	 * Write one more offset, which points to the end of the entries,
+	 * because we use it for calculating the dir length, instead of
+	 * using strlen.
+	 */
+	offset_write = htonl(current_offset);
+	if (ce_write(NULL, fd, &offset_write, 4) < 0)
+		return -1;
+	current = de;
+	/* Write directory entries */
+	while (current) {
+		int size = ondisk_size + current->de_pathlen + 1;
+
+		crc = 0;
+		current->de_foffset = foffset;
+		if (current->de_pathlen != 0)
+			size++;
+		ondisk = xmalloc(size);
+		ondisk_from_directory_entry(current, ondisk);
+		if (ce_write(&crc, fd, ondisk, size) < 0)
+			return -1;
+		crc = htonl(crc);
+		if (ce_write(NULL, fd, &crc, 4) < 0)
+			return -1;
+		foffset += current->de_nfiles * 4;
+		free(ondisk);
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_entries(struct index_state *istate,
+			 struct directory_entry *de,
+			 int entries,
+			 int fd)
+{
+	int offset, offset_write, ondisk_size;
+	struct directory_entry *current;
+
+	offset = 0;
+	ondisk_size = offsetof(struct ondisk_cache_entry, name);
+	current = de;
+	/* Write cache entry offsets */
+	while (current) {
+		int pathlen;
+		struct cache_entry *ce = current->ce;
+
+		pathlen = current->de_pathlen ? current->de_pathlen + 1 : 0;
+		while (ce) {
+			if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
+				ce_smudge_racily_clean_entry(ce);
+			if (is_null_sha1(ce->sha1)) {
+				static const char msg[] = "cache entry has null sha1: %s";
+				static int allow = -1;
+
+				if (allow < 0)
+					allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);
+				if (allow)
+					warning(msg, ce->name);
+				else
+					return error(msg, ce->name);
+			}
+			offset_write = htonl(offset);
+			if (ce_write(NULL, fd, &offset_write, 4) < 0)
+				return -1;
+			offset += ce_namelen(ce) - pathlen + 1 + ondisk_size + 4;
+			ce = ce->next_ce;
+		}
+		current = current->next;
+	}
+	/*
+	 * Write one more offset, which points to the end of the entries,
+	 * because we use it for calculating the file length, instead of
+	 * using strlen.
+	 */
+	offset_write = htonl(offset);
+	if (ce_write(NULL, fd, &offset_write, 4) < 0)
+		return -1;
+
+	current = de;
+	/* Write cache entries */
+	while (current) {
+		int pathlen;
+		struct cache_entry *ce = current->ce;
+
+		pathlen = current->de_pathlen ? current->de_pathlen + 1 : 0;
+		while (ce) {
+			int size = offsetof(struct ondisk_cache_entry, name) +
+				ce_namelen(ce) - pathlen + 1;
+			struct ondisk_cache_entry *ondisk = xmalloc(size);
+			uint32_t crc;
+
+			crc = 0;
+			ondisk_from_cache_entry(ce, ondisk, pathlen);
+			if (ce_write(&crc, fd, ondisk, size) < 0)
+				return -1;
+			crc = htonl(crc);
+			if (ce_write(NULL, fd, &crc, 4) < 0)
+				return -1;
+			offset += 4;
+			ce = ce->next_ce;
+		}
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_index_v5(struct index_state *istate, int newfd)
+{
+	struct cache_header hdr;
+	struct cache_header_v5 hdr_v5;
+	struct cache_entry **cache = istate->cache;
+	struct directory_entry *de;
+	unsigned int entries = istate->cache_nr;
+	unsigned int i, removed, total_dir_len;
+	unsigned int total_file_len, foffsetblock;
+	unsigned int ndir;
+	uint32_t crc;
+
+	if (istate->filter_opts)
+		die("BUG: index: cannot write a partially read index");
+
+	for (i = removed = 0; i < entries; i++) {
+		if (cache[i]->ce_flags & CE_REMOVE)
+			removed++;
+	}
+	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
+	hdr.hdr_version = htonl(istate->version);
+	hdr.hdr_entries = htonl(entries - removed);
+	hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
+
+	total_dir_len = 0;
+	total_file_len = 0;
+	de = compile_directory_data(istate, entries, &ndir,
+				    &total_dir_len, &total_file_len);
+	hdr_v5.hdr_ndir = htonl(ndir);
+
+	foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
+		+ (ndir + 1) * 4
+		+ total_dir_len
+		+ ndir * (offsetof(struct ondisk_directory_entry, name) + 4);
+	hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
+	crc = 0;
+	if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
+		return -1;
+	if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
+		return -1;
+	crc = htonl(crc);
+	if (ce_write(NULL, newfd, &crc, 4) < 0)
+		return -1;
+
+	if (write_directories(de, newfd) < 0)
+		return -1;
+	if (write_entries(istate, de, entries, newfd) < 0)
+		return -1;
+	return ce_flush(newfd);
+}
+
 struct index_ops v5_ops = {
 	match_stat_basic,
 	verify_hdr,
 	read_index_v5,
-	NULL
+	write_index_v5
 };
diff --git a/read-cache.c b/read-cache.c
index baa052c..46551af 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -106,7 +106,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
 	return changed;
 }
 
-static uint32_t calculate_stat_crc(struct cache_entry *ce)
+uint32_t calculate_stat_crc(struct cache_entry *ce)
 {
 	unsigned int ctimens = 0;
 	uint32_t stat, stat_crc;
@@ -227,6 +227,8 @@ static void set_istate_ops(struct index_state *istate)
 {
 	if (istate->version >= 2 && istate->version <= 4)
 		istate->ops = &v2_ops;
+	if (istate->version == 5)
+		istate->ops = &v5_ops;
 }
 
 int ce_match_stat_basic(const struct index_state *istate,
diff --git a/read-cache.h b/read-cache.h
index 7823fbb..9d66df6 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -61,5 +61,6 @@ extern int ce_match_stat_basic(const struct index_state *istate,
 			       const struct cache_entry *ce, struct stat *st);
 extern int is_racy_timestamp(const struct index_state *istate, const struct cache_entry *ce);
 extern void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce);
+extern uint32_t calculate_stat_crc(struct cache_entry *ce);
 
 #endif
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 16/24] read-cache: write index-v5 cache-tree data
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (14 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Write the cache-tree data for the index version 5 file format. The
in-memory cache-tree data is converted to the ondisk format, by adding
it to the directory entries, that were compiled from the cache-entries
in the step before.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache-v5.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 797022f..0d06cfe 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -789,6 +789,57 @@ static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
 	return ret;
 }
 
+static void convert_one_to_ondisk(struct hash_table *table, struct cache_tree *it,
+				  const char *path, int pathlen, uint32_t crc)
+{
+	int i, path_len = strlen(path);
+	struct directory_entry *search;
+
+	crc = crc32(crc, (Bytef*)path, pathlen);
+	search = lookup_hash(crc, table);
+	while (search && (path_len > search->de_pathlen
+			  || strcmp(path, search->pathname + search->de_pathlen - path_len)))
+		search = search->next_hash;
+	if (!search)
+		return;
+	/*
+	 * The number of subtrees is already calculated by
+	 * compile_directory_data, therefore we only need to
+	 * add the entry_count
+	 */
+	search->de_nentries = it->entry_count;
+	if (0 <= it->entry_count)
+		hashcpy(search->sha1, it->sha1);
+
+#if DEBUG
+	if (0 <= it->entry_count)
+		fprintf(stderr, "cache-tree <%.*s> (%d ent, %d subtree) %s\n",
+			pathlen, path, it->entry_count, it->subtree_nr,
+			sha1_to_hex(it->sha1));
+	else
+		fprintf(stderr, "cache-tree <%.*s> (%d subtree) invalid\n",
+			pathlen, path, it->subtree_nr);
+#endif
+
+	if (strcmp(path, ""))
+		crc = crc32(crc, (Bytef*)"/", 1);
+	for (i = 0; i < it->subtree_nr; i++) {
+		struct cache_tree_sub *down = it->down[i];
+		if (i) {
+			struct cache_tree_sub *prev = it->down[i-1];
+			if (subtree_name_cmp(down->name, down->namelen,
+					     prev->name, prev->namelen) <= 0)
+				die("fatal - unsorted cache subtree");
+		}
+		convert_one_to_ondisk(table, down->cache_tree, down->name, down->namelen, crc);
+	}
+}
+
+static void cache_tree_to_ondisk(struct hash_table *table, struct cache_tree *root)
+{
+	convert_one_to_ondisk(table, root, "", 0, 0);
+}
+
 static void ce_queue_push(struct cache_entry **head,
 			  struct cache_entry **tail,
 			  struct cache_entry *ce)
@@ -844,6 +895,8 @@ static struct directory_entry *compile_directory_data(struct index_state *istate
 			*total_file_len -= search->de_pathlen + 1;
 		ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
 	}
+	if (istate->cache_tree)
+		cache_tree_to_ondisk(&table, istate->cache_tree);
 	return de;
 }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (15 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Make git read the resolve-undo data from the index.

Since the resolve-undo data is joined with the conflicts in the ondisk
format of the index file version 5, conflicts and resolved data is read
at the same time, and the resolve-undo data is then converted to the
in-memory format.

Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache-v5.c | 199 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 191 insertions(+), 8 deletions(-)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 0d06cfe..a5e9b5a 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -723,6 +723,29 @@ static void ondisk_from_directory_entry(struct directory_entry *de,
 	}
 }
 
+static struct conflict_part *conflict_part_from_inmemory(struct cache_entry *ce)
+{
+	struct conflict_part *conflict;
+	int flags;
+
+	conflict = xmalloc(sizeof(struct conflict_part));
+	flags                = CONFLICT_CONFLICTED;
+	flags               |= ce_stage(ce) << CONFLICT_STAGESHIFT;
+	conflict->flags      = flags;
+	conflict->entry_mode = ce->ce_mode;
+	conflict->next       = NULL;
+	hashcpy(conflict->sha1, ce->sha1);
+	return conflict;
+}
+
+static void conflict_to_ondisk(struct conflict_part *cp,
+			       struct ondisk_conflict_part *ondisk)
+{
+	ondisk->flags      = htons(cp->flags);
+	ondisk->entry_mode = htons(cp->entry_mode);
+	hashcpy(ondisk->sha1, cp->sha1);
+}
+
 static void insert_directory_entry(struct directory_entry *de,
 				   struct hash_table *table,
 				   unsigned int *total_dir_len,
@@ -789,6 +812,11 @@ static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
 	return ret;
 }
 
+static struct conflict_entry *create_conflict_entry_from_ce(struct cache_entry *ce)
+{
+	return create_new_conflict(ce->name, ce_namelen(ce));
+}
+
 static void convert_one_to_ondisk(struct hash_table *table, struct cache_tree *it,
 				  const char *path, int pathlen, uint32_t crc)
 {
@@ -840,6 +868,52 @@ static void cache_tree_to_ondisk(struct hash_table *table, struct cache_tree *ro
 	convert_one_to_ondisk(table, root, "", 0, 0);
 }
 
+static void resolve_undo_to_ondisk(struct string_list *resolve_undo,
+				   struct conflict_entry **conflict_queue)
+{
+	struct string_list_item *item;
+	struct conflict_entry *current = *conflict_queue;
+
+	if (!resolve_undo)
+		return;
+	for_each_string_list_item(item, resolve_undo) {
+		struct conflict_entry *conflict_entry;
+		struct resolve_undo_info *ui = item->util;
+		int i, len;
+
+		if (!ui)
+			continue;
+
+		len = strlen(item->string);
+		while (current && current->next &&
+		       cache_name_compare(current->name, current->namelen,
+					  item->string, len))
+			current = current->next;
+
+		conflict_entry = create_new_conflict(item->string, len);
+		for (i = 0; i < 3; i++) {
+			if (ui->mode[i]) {
+				struct conflict_part *cp;
+
+				cp = xmalloc(sizeof(struct conflict_part));
+				cp->flags = (i + 1) << CONFLICT_STAGESHIFT;
+				cp->entry_mode = ui->mode[i];
+				cp->next = NULL;
+				hashcpy(cp->sha1, ui->sha1[i]);
+				add_part_to_conflict_entry(conflict_entry, cp);
+			}
+		}
+		if (!*conflict_queue) {
+			*conflict_queue = conflict_entry;
+			conflict_entry->next = NULL;
+			current = conflict_entry;
+		} else {
+			conflict_entry->next = current->next;
+			current->next = conflict_entry;
+		}
+	}
+}
+
 static void ce_queue_push(struct cache_entry **head,
 			  struct cache_entry **tail,
 			  struct cache_entry *ce)
@@ -855,15 +929,32 @@ static void ce_queue_push(struct cache_entry **head,
 	*tail = (*tail)->next_ce;
 }
 
+static void conflict_queue_push(struct conflict_entry **head,
+				struct conflict_entry **tail,
+				struct conflict_entry *conflict)
+{
+	if (!*head) {
+		*head = *tail = conflict;
+		(*tail)->next = NULL;
+		return;
+	}
+
+	(*tail)->next = conflict;
+	conflict->next = NULL;
+	*tail = (*tail)->next;
+}
+
 static struct directory_entry *compile_directory_data(struct index_state *istate,
 						      int nfile, unsigned int *ndir,
 						      unsigned int *total_dir_len,
-						      unsigned int *total_file_len)
+						      unsigned int *total_file_len,
+						      struct conflict_entry **conflict_queue)
 {
 	int i, dir_len = -1;
 	char *dir;
 	struct directory_entry *de, *current, *search;
 	struct cache_entry **cache = istate->cache;
+	struct conflict_entry *conflict_entry = NULL, *tail;
 	struct hash_table table;
 	uint32_t crc;
 
@@ -894,9 +985,22 @@ static struct directory_entry *compile_directory_data(struct index_state *istate
 		if (search->de_pathlen)
 			*total_file_len -= search->de_pathlen + 1;
 		ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
+
+		if (ce_stage(cache[i]) > 0) {
+			struct conflict_part *conflict_part;
+			if (!conflict_entry ||
+			    cache_name_compare(conflict_entry->name, conflict_entry->namelen,
+					       cache[i]->name, ce_namelen(cache[i]))) {
+				conflict_entry = create_conflict_entry_from_ce(cache[i]);
+				conflict_queue_push(conflict_queue, &tail, conflict_entry);
+			}
+			conflict_part = conflict_part_from_inmemory(cache[i]);
+			add_part_to_conflict_entry(conflict_entry, conflict_part);
+		}
 	}
 	if (istate->cache_tree)
 		cache_tree_to_ondisk(&table, istate->cache_tree);
+	resolve_undo_to_ondisk(istate->resolve_undo, conflict_queue);
 	return de;
 }
 
@@ -1062,16 +1166,82 @@ static int write_entries(struct index_state *istate,
 	return 0;
 }
 
+static int write_conflict(struct conflict_entry *conflict, int fd)
+{
+	struct conflict_entry *current;
+	struct conflict_part *current_part;
+	uint32_t crc;
+
+	current = conflict;
+	while (current) {
+		unsigned int to_write, i;
+
+		crc = 0;
+		if (ce_write(&crc, fd, current->name, current->namelen) < 0)
+			return -1;
+		if (ce_write(&crc, fd, "\0", 1) < 0)
+			return -1;
+		to_write = htonl(current->nfileconflicts);
+		if (ce_write(&crc, fd, (Bytef*)&to_write, 4) < 0)
+			return -1;
+		current_part = current->entries;
+		for (i = 0; i < current->nfileconflicts; i++) {
+			struct ondisk_conflict_part ondisk;
+
+			conflict_to_ondisk(current_part, &ondisk);
+			if (ce_write(&crc, fd, (Bytef*)&ondisk, sizeof(struct ondisk_conflict_part)) < 0)
+				return 0;
+			current_part = current_part->next;
+		}
+		crc = htonl(crc);
+		if (ce_write(NULL, fd, &crc, 4) < 0)
+			return -1;
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_resolve_undo(struct index_state *istate,
+			      struct conflict_entry *conflict_queue,
+			      int fd)
+{
+	struct conflict_entry *current;
+	int nr = 0;
+	uint32_t crc = 0, to_write;
+
+	/* Just count */
+	for (current = conflict_queue; current; current = current->next)
+		nr++;
+
+	if (ce_write(&crc, fd, "REUC", 4) < 0)
+		return -1;
+	to_write = htonl(nr);
+	if (ce_write(&crc, fd, &to_write, 4) < 0)
+		return -1;
+	to_write = htonl(crc);
+	if (ce_write(NULL, fd, &to_write, 4) < 0)
+		return -1;
+
+	current = conflict_queue;
+	while (current) {
+		if (write_conflict(current, fd) < 0)
+			return -1;
+		current = current->next;
+	}
+	return 0;
+}
+
 static int write_index_v5(struct index_state *istate, int newfd)
 {
 	struct cache_header hdr;
 	struct cache_header_v5 hdr_v5;
 	struct cache_entry **cache = istate->cache;
 	struct directory_entry *de;
+	struct conflict_entry *conflict_queue = NULL;
 	unsigned int entries = istate->cache_nr;
 	unsigned int i, removed, total_dir_len;
 	unsigned int total_file_len, foffsetblock;
-	unsigned int ndir;
+	unsigned int ndir, extoffset, nextension;
 	uint32_t crc;
 
 	if (istate->filter_opts)
@@ -1084,24 +1254,34 @@ static int write_index_v5(struct index_state *istate, int newfd)
 	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
 	hdr.hdr_version = htonl(istate->version);
 	hdr.hdr_entries = htonl(entries - removed);
-	hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
 
 	total_dir_len = 0;
 	total_file_len = 0;
 	de = compile_directory_data(istate, entries, &ndir,
-				    &total_dir_len, &total_file_len);
+				    &total_dir_len, &total_file_len,
+				    &conflict_queue);
 	hdr_v5.hdr_ndir = htonl(ndir);
 
-	foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
-		+ (ndir + 1) * 4
-		+ total_dir_len
-		+ ndir * (offsetof(struct ondisk_directory_entry, name) + 4);
+	nextension = (istate->resolve_undo || conflict_queue) ? 1 : 0;
+	foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + (nextension * 4) + 4 +
+		(ndir + 1) * 4 + total_dir_len +
+		ndir * (offsetof(struct ondisk_directory_entry, name) + 4);
 	hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
+	hdr_v5.hdr_nextension = htonl(nextension);
+
 	crc = 0;
 	if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 	if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
 		return -1;
+
+	if (nextension) {
+		extoffset = foffsetblock + (entries - removed + 1) * 4 + total_file_len +
+			(entries - removed) * (offsetof(struct ondisk_cache_entry, name) + 4);
+		extoffset = htonl(extoffset);
+		if (ce_write(&crc, newfd, &extoffset, 4) < 0)
+			return -1;
+	}
 	crc = htonl(crc);
 	if (ce_write(NULL, newfd, &crc, 4) < 0)
 		return -1;
@@ -1110,6 +1290,9 @@ static int write_index_v5(struct index_state *istate, int newfd)
 		return -1;
 	if (write_entries(istate, de, entries, newfd) < 0)
 		return -1;
+	if (nextension)
+		if (write_resolve_undo(istate, conflict_queue, newfd) < 0)
+			return -1;
 	return ce_flush(newfd);
 }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 18/24] update-index.c: rewrite index when index-version is given
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (16 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Make update-index always rewrite the index when a index-version
is given, even if the index already has the right version.
This option is used for performance testing the writer and
reader.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/update-index.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index c5bb889..8b3f7a0 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -6,6 +6,7 @@
 #include "cache.h"
 #include "quote.h"
 #include "cache-tree.h"
+#include "read-cache.h"
 #include "tree-walk.h"
 #include "builtin.h"
 #include "refs.h"
@@ -861,8 +862,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    preferred_index_format,
 			    INDEX_FORMAT_LB, INDEX_FORMAT_UB);
 
-		if (the_index.version != preferred_index_format)
-			active_cache_changed = 1;
+		active_cache_changed = 1;
 		change_cache_version(preferred_index_format);
 	}
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (17 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay, Thomas Rast

From: Thomas Rast <trast@inf.ethz.ch>

Add a performance test for index version [23]/4/5 by using
git update-index --index-version=x, thus testing both the reader
and the writer speed of all index formats.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/perf/p0003-index.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100755 t/perf/p0003-index.sh

diff --git a/t/perf/p0003-index.sh b/t/perf/p0003-index.sh
new file mode 100755
index 0000000..5360175
--- /dev/null
+++ b/t/perf/p0003-index.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+test_description="Tests index versions [23]/4/5"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success "convert to v3" "
+	git update-index --index-version=2
+"
+
+test_perf "v[23]: update-index" "
+	git update-index --index-version=2 >/dev/null
+"
+
+subdir=$(git ls-files | sed 's#/[^/]*$##' | grep -v '^$' | uniq | tail -n 30 | head -1)
+
+test_perf "v[23]: grep nonexistent -- subdir" "
+	test_must_fail git grep nonexistent -- $subdir >/dev/null
+"
+
+test_perf "v[23]: ls-files -- subdir" "
+	git ls-files $subdir >/dev/null
+"
+
+test_expect_success "convert to v4" "
+	git update-index --index-version=4
+"
+
+test_perf "v4: update-index" "
+	git update-index --index-version=4 >/dev/null
+"
+
+test_perf "v4: grep nonexistent -- subdir" "
+	test_must_fail git grep nonexistent -- $subdir >/dev/null
+"
+
+test_perf "v4: ls-files -- subdir" "
+	git ls-files $subdir >/dev/null
+"
+
+test_expect_success "convert to v5" "
+	git update-index --index-version=5
+"
+
+test_perf "v5: update-index" "
+	git update-index --index-version=5 >/dev/null
+"
+
+test_perf "v5: ls-files" "
+	git ls-files >/dev/null
+"
+
+test_perf "v5: grep nonexistent -- subdir" "
+	test_must_fail git grep nonexistent -- $subdir >/dev/null
+"
+
+test_perf "v5: ls-files -- subdir" "
+	git ls-files $subdir >/dev/null
+"
+
+test_done
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (18 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 21:57   ` Eric Sunshine
  2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Respect a GIT_INDEX_VERSION environment variable, when a new index is
initialized.  Setting the environment variable will not cause existing
index files to be converted to another format for additional safety.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/git.txt | 5 +++++
 read-cache.c          | 9 +++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 10cddb5..2b2aad5 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -703,6 +703,11 @@ Git so take care if using Cogito etc.
 	index file. If not specified, the default of `$GIT_DIR/index`
 	is used.
 
+'GIT_INDEX_VERSION'::
+	This environment variable allows the specification of an index
+	version for new repositories.  It won't affect existing index
+	files.  By default index file version 3 is used.
+
 'GIT_OBJECT_DIRECTORY'::
 	If the object storage directory is specified via this
 	environment variable then the sha1 directories are created
diff --git a/read-cache.c b/read-cache.c
index 46551af..04430e5 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1233,8 +1233,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 void initialize_index(struct index_state *istate, int version)
 {
 	istate->initialized = 1;
-	if (!version)
-		version = INDEX_FORMAT_DEFAULT;
+	if (!version) {
+		char *envversion = getenv("GIT_INDEX_VERSION");
+		if (!envversion)
+			version = INDEX_FORMAT_DEFAULT;
+		else
+			version = atoi(envversion);
+	}
 	istate->version = version;
 	set_istate_ops(istate);
 }
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 21/24] test-lib: allow setting the index format version
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (19 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

When running the test suite, it should be possible to set the default
index format for the tests.  Do that by allowing the user to add a
TEST_GIT_INDEX_VERSION variable in config.mak setting the index version.

If it isn't set, the default version given in the source code is
used (currently version 3).

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Makefile                | 7 +++++++
 t/test-lib-functions.sh | 5 +++++
 t/test-lib.sh           | 3 +++
 3 files changed, 15 insertions(+)

diff --git a/Makefile b/Makefile
index 6a1b054..8539548 100644
--- a/Makefile
+++ b/Makefile
@@ -342,6 +342,10 @@ all::
 # Define DEFAULT_HELP_FORMAT to "man", "info" or "html"
 # (defaults to "man") if you want to have a different default when
 # "git help" is called without a parameter specifying the format.
+#
+# Define TESTGIT_INDEX_FORMAT to 2, 3, 4 or 5 to run the test suite
+# with a different indexfile format.  If it isn't set the index file
+# format used is index-v[23].
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -2218,6 +2222,9 @@ endif
 ifdef GIT_PERF_MAKE_OPTS
 	@echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@
 endif
+ifdef TEST_GIT_INDEX_VERSION
+	@echo TEST_GIT_INDEX_VERSION='$(subst ','\'',$(subst ','\'',$(TEST_GIT_INDEX_VERSION)))' >>$@
+endif
 
 ### Detect Python interpreter path changes
 ifndef NO_PYTHON
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 2f79146..4034262 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -31,6 +31,11 @@ test_set_editor () {
 	export EDITOR
 }
 
+test_set_index_version () {
+    GIT_INDEX_VERSION="$1"
+    export GIT_INDEX_VERSION
+}
+
 test_decode_color () {
 	awk '
 		function name(n) {
diff --git a/t/test-lib.sh b/t/test-lib.sh
index b25249e..d9e810c 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -104,6 +104,9 @@ export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
 export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export EDITOR
 
+GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
+export GIT_INDEX_VERSION
+
 # Add libc MALLOC and MALLOC_PERTURB test
 # only if we are not executing the test with valgrind
 if expr " $GIT_TEST_OPTS " : ".* --valgrind " >/dev/null ||
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 22/24] t1600: add index v5 specific tests
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (20 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Add a test that tests only index v5 specific corner cases, to protect
against breaking them in the future.

Currently there is only one known case where the sorting is broken if
the index is read filtered with two different length pathspecs.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/t1600-index-v5.sh | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100755 t/t1600-index-v5.sh

diff --git a/t/t1600-index-v5.sh b/t/t1600-index-v5.sh
new file mode 100755
index 0000000..fe68976
--- /dev/null
+++ b/t/t1600-index-v5.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description="Test index-v5 specific corner cases"
+
+. ./test-lib.sh
+
+test_set_index_version 5
+
+test_expect_success 'setup' '
+	mkdir -p abc/def def &&
+	touch abc/def/xyz def/xyz &&
+	git add . &&
+	git commit -m "test commit"
+'
+
+test_expect_success 'ls-files ordering correct' '
+	cat <<-\EOF >expected &&
+	abc/def/xyz
+	def/xyz
+	EOF
+	git ls-files abc/def/xyz def/xyz >actual &&
+	test_cmp expected actual
+'
+
+test_done
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 23/24] POC for partial writing
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (21 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-11-30  9:58   ` Duy Nguyen
  2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
  2013-12-09 10:14 ` [PATCH v4 00/24] Index-v5 Thomas Gummerer
  24 siblings, 1 reply; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

This makes update-index use both partial reading and partial writing.
Partial reading is only used no option other than the paths is passed to
the command.

This passes the test suite, but doesn't behave correctly when a write
fails.  A log should be written to the lock file, in order to be able to
recover if a write fails.
---
 builtin/update-index.c |  43 +++++++++++---
 cache-tree.c           |  13 +++++
 cache-tree.h           |   1 +
 cache.h                |  27 ++++++++-
 lockfile.c             |   2 +-
 read-cache-v2.c        |   2 +
 read-cache-v5.c        | 154 ++++++++++++++++++++++++++++++++++++++++---------
 read-cache.c           |  30 ++++++++++
 read-cache.h           |   1 +
 resolve-undo.c         |   1 +
 10 files changed, 237 insertions(+), 37 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 8b3f7a0..69f0949 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -56,6 +56,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 		else
 			active_cache[pos]->ce_flags &= ~flag;
 		cache_tree_invalidate_path(active_cache_tree, path);
+		the_index.needs_rewrite = 1;
 		active_cache_changed = 1;
 		return 0;
 	}
@@ -99,6 +100,8 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len
 	memcpy(ce->name, path, len);
 	ce->ce_flags = create_ce_flags(0);
 	ce->ce_namelen = len;
+	if (old)
+		ce->entry_pos = old->entry_pos;
 	fill_stat_cache_info(ce, st);
 	ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
 
@@ -268,6 +271,7 @@ static void chmod_path(int flip, const char *path)
 		goto fail;
 	}
 	cache_tree_invalidate_path(active_cache_tree, path);
+	the_index.needs_rewrite = 1;
 	active_cache_changed = 1;
 	report("chmod %cx '%s'", flip, path);
 	return;
@@ -706,15 +710,18 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx,
 
 int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
-	int newfd, entries, has_errors = 0, line_termination = '\n';
+	int newfd, has_errors = 0, line_termination = '\n';
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
 	char set_executable_bit = 0;
 	struct refresh_params refresh_args = {0, &has_errors};
 	int lock_error = 0;
+	struct filter_opts opts;
+	struct pathspec pathspec;
 	struct lock_file *lock_file;
 	struct parse_opt_ctx_t ctx;
+	int i, needs_full_read = 0;
 	int parseopt_state = PARSE_OPT_UNKNOWN;
 	struct option options[] = {
 		OPT_BIT('q', NULL, &refresh_args.flags,
@@ -810,9 +817,23 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (newfd < 0)
 		lock_error = errno;
 
-	entries = read_cache();
-	if (entries < 0)
-		die("cache corrupted");
+	for (i = 0; i < argc; i++) {
+		if (!prefixcmp(argv[i], "--"))
+			needs_full_read = 1;
+	}
+	if (!needs_full_read) {
+		memset(&opts, 0, sizeof(struct filter_opts));
+		parse_pathspec(&pathspec, 0,
+			       PATHSPEC_PREFER_CWD |
+			       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
+			       prefix, argv + 1);
+		opts.pathspec = &pathspec;
+		if (read_cache_filtered(&opts) < 0)
+			die("cache corrupted");
+	} else {
+		if (read_cache() < 0)
+			die("cache corrupted");
+	}
 
 	/*
 	 * Custom copy of parse_options() because we want to handle
@@ -862,6 +883,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    preferred_index_format,
 			    INDEX_FORMAT_LB, INDEX_FORMAT_UB);
 
+		the_index.needs_rewrite = 1;
 		active_cache_changed = 1;
 		change_cache_version(preferred_index_format);
 	}
@@ -890,17 +912,22 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	}
 
 	if (active_cache_changed) {
+		int r;
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
 				exit(128);
 			unable_to_lock_index_die(get_index_file(), lock_error);
 		}
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_locked_index(lock_file))
+		r = write_cache_partial(newfd);
+		if (r < 0)
 			die("Unable to write new index file");
+		else if (r == 0)
+			commit_lock_file(lock_file);
+		else
+			remove_lock_file();
+	} else {
+		rollback_lock_file(lock_file);
 	}
 
-	rollback_lock_file(lock_file);
-
 	return has_errors ? 1 : 0;
 }
diff --git a/cache-tree.c b/cache-tree.c
index 1209732..a3d18bb 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -123,6 +123,15 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
 		return;
 	slash = strchr(path, '/');
 	it->entry_count = -1;
+	/*
+	 * Mark the cache_tree directory entry as invalid too. The
+	 * entry_count defines if the tree is valid, so we don't need
+	 * to reset any other field.
+	 */
+	if (it->de_ref) {
+		it->de_ref->de_nentries = -1;
+		it->de_ref->changed = 1;
+	}
 	if (!slash) {
 		int pos;
 		namelen = strlen(path);
@@ -140,6 +149,10 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
 				sizeof(struct cache_tree_sub *) *
 				(it->subtree_nr - pos - 1));
 			it->subtree_nr--;
+			if (it->de_ref) {
+				it->de_ref->de_nsubtrees--;
+				it->de_ref->changed = 1;
+			}
 		}
 		return;
 	}
diff --git a/cache-tree.h b/cache-tree.h
index 9818926..eaf14a9 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -18,6 +18,7 @@ struct cache_tree {
 	unsigned char sha1[20];
 	int subtree_nr;
 	int subtree_alloc;
+	struct directory_entry *de_ref;
 	struct cache_tree_sub **down;
 };
 
diff --git a/cache.h b/cache.h
index 71b98cf..1a634dc 100644
--- a/cache.h
+++ b/cache.h
@@ -137,11 +137,31 @@ struct cache_entry {
 	unsigned int ce_namelen;
 	unsigned char sha1[20];
 	uint32_t ce_stat_crc;
+	unsigned int entry_pos;
+	unsigned int changed;
 	struct cache_entry *next; /* used by name_hash */
 	struct cache_entry *next_ce;
 	char name[FLEX_ARRAY]; /* more */
 };
 
+struct directory_entry {
+	struct directory_entry **sub;
+	struct directory_entry *next;
+	struct directory_entry *next_hash;
+	struct cache_entry *ce;
+	struct cache_entry *ce_last;
+	uint32_t de_foffset;
+	uint32_t de_nsubtrees;
+	uint32_t de_nfiles;
+	uint32_t de_nentries;
+	unsigned char sha1[20];
+	uint16_t de_flags;
+	uint32_t de_pathlen;
+	uint32_t entry_pos;
+	unsigned int changed;
+	char pathname[FLEX_ARRAY];
+};
+
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_EXTENDED  (0x4000)
@@ -317,13 +337,15 @@ struct filter_opts {
 
 struct index_state {
 	struct cache_entry **cache;
+	struct directory_entry *root_directory;
 	unsigned int version;
 	unsigned int cache_nr, cache_alloc, cache_changed;
 	struct string_list *resolve_undo;
 	struct cache_tree *cache_tree;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
-		 initialized : 1;
+		 initialized : 1,
+		 needs_rewrite : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
 	struct index_ops *ops;
@@ -353,6 +375,7 @@ extern void free_name_hash(struct index_state *istate);
 #define is_cache_unborn() is_index_unborn(&the_index)
 #define read_cache_unmerged() read_index_unmerged(&the_index)
 #define write_cache(newfd, cache, entries) write_index(&the_index, (newfd))
+#define write_cache_partial(newfd) write_index_partial(&the_index, (newfd))
 #define discard_cache() discard_index(&the_index)
 #define unmerged_cache() unmerged_index(&the_index)
 #define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
@@ -529,6 +552,7 @@ extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 extern int write_index(struct index_state *, int newfd);
+extern int write_index_partial(struct index_state *, int newfd);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
@@ -613,6 +637,7 @@ extern NORETURN void unable_to_lock_index_die(const char *path, int err);
 extern int hold_lock_file_for_update(struct lock_file *, const char *path, int);
 extern int hold_lock_file_for_append(struct lock_file *, const char *path, int);
 extern int commit_lock_file(struct lock_file *);
+extern void remove_lock_file(void);
 extern void update_index_if_able(struct index_state *, struct lock_file *);
 
 extern int hold_locked_index(struct lock_file *, int);
diff --git a/lockfile.c b/lockfile.c
index 8fbcb6a..c150e5c 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -7,7 +7,7 @@
 static struct lock_file *lock_file_list;
 static const char *alternate_index_output;
 
-static void remove_lock_file(void)
+void remove_lock_file(void)
 {
 	pid_t me = getpid();
 
diff --git a/read-cache-v2.c b/read-cache-v2.c
index f884c10..1fec892 100644
--- a/read-cache-v2.c
+++ b/read-cache-v2.c
@@ -555,5 +555,7 @@ struct index_ops v2_ops = {
 	match_stat_basic,
 	verify_hdr,
 	read_index_v2,
+	write_index_v2,
+	/* Partial writing is the same as writing the full index for v2 */
 	write_index_v2
 };
diff --git a/read-cache-v5.c b/read-cache-v5.c
index a5e9b5a..13436a3 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -20,22 +20,6 @@ struct extension_header {
 	uint32_t crc;
 };
 
-struct directory_entry {
-	struct directory_entry **sub;
-	struct directory_entry *next;
-	struct directory_entry *next_hash;
-	struct cache_entry *ce;
-	struct cache_entry *ce_last;
-	uint32_t de_foffset;
-	uint32_t de_nsubtrees;
-	uint32_t de_nfiles;
-	uint32_t de_nentries;
-	unsigned char sha1[20];
-	uint16_t de_flags;
-	uint32_t de_pathlen;
-	char pathname[FLEX_ARRAY];
-};
-
 struct conflict_part {
 	struct conflict_part *next;
 	uint16_t flags;
@@ -246,7 +230,7 @@ static struct directory_entry *read_directories(unsigned int *dir_offset,
 		offsetof(struct ondisk_directory_entry, name) - 5;
 	disk_de = ptr_add(mmap, *dir_offset);
 	de = directory_entry_from_ondisk(disk_de, len);
-
+	de->entry_pos = *dir_offset;
 	data_len = len + 1 + offsetof(struct ondisk_directory_entry, name);
 	filecrc = ptr_add(mmap, *dir_offset + data_len);
 	if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
@@ -281,6 +265,7 @@ static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
 	entry_offset = first_entry_offset + ntoh_l(*beginning);
 	disk_ce = ptr_add(mmap, entry_offset);
 	*ce = cache_entry_from_ondisk(disk_ce, pathname, len, pathlen);
+	(*ce)->entry_pos = entry_offset;
 	filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
 	if (!check_crc32(0,
 		ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
@@ -439,6 +424,7 @@ static struct cache_tree *convert_one(struct directory_entry *de)
 
 	it = cache_tree();
 	it->entry_count = de->de_nentries;
+	it->de_ref = de;
 	if (0 <= it->entry_count)
 		hashcpy(it->sha1, de->sha1);
 
@@ -523,14 +509,6 @@ static int read_entries(struct index_state *istate, struct directory_entry *de,
 	return 0;
 }
 
-static void free_directory_tree(struct directory_entry *de) {
-	int i;
-
-	for (i = 0; i < de->de_pathlen; i++)
-		free_directory_tree(de->sub[i]);
-	free(de);
-}
-
 /*
  * Read an index-v5 file filtered by the filter_opts.   If opts is NULL,
  * everything will be read.
@@ -626,7 +604,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 		}
 	}
 	istate->cache_tree = cache_tree_convert_v5(root_directory);
-	free_directory_tree(root_directory);
+	istate->root_directory = root_directory;
 	istate->cache_nr = nr;
 	return 0;
 }
@@ -696,6 +674,7 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
 	 * that hasn't changed checking the sha1.
 	 */
 	ce->ce_flags |= CE_SMUDGED;
+	ce->changed = 1;
 }
 
 static char *super_directory(char *filename)
@@ -1231,6 +1210,103 @@ static int write_resolve_undo(struct index_state *istate,
 	return 0;
 }
 
+static int write_ce_if_necessary(struct cache_entry *ce, void *cb_data)
+{
+	int *fdx = cb_data, pathlen, size;
+	int fd = *fdx;
+	char *dir;
+	struct ondisk_cache_entry *ondisk;
+	uint32_t crc;
+
+	assert(ce->entry_pos != 0);
+	/* TODO I'm just using the_index out of lazyness here */
+	if (!ce_uptodate(ce) && is_racy_timestamp(&the_index, ce))
+		ce_smudge_racily_clean_entry(ce);
+	if (!ce->changed)
+		return 0;
+	if (is_null_sha1(ce->sha1)) {
+		static const char msg[] = "cache entry has null sha1: %s";
+		static int allow = -1;
+
+		if (allow < 0)
+			allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);
+		if (allow)
+			warning(msg, ce->name);
+		else
+			return error(msg, ce->name);
+	}
+	dir = super_directory(ce->name);
+	pathlen = dir ? strlen(dir) + 1 : 0;
+	size = offsetof(struct ondisk_cache_entry, name) +
+		ce_namelen(ce) - pathlen + 1;
+	ondisk = xmalloc(size);
+	
+	crc = 0;
+	ondisk_from_cache_entry(ce, ondisk, pathlen);
+	if (lseek(fd, ce->entry_pos, SEEK_SET) < ce->entry_pos)
+		die("eror ce seeking");
+	if (ce_write(&crc, fd, ondisk, size) < 0)
+		return -1;
+	crc = htonl(crc);
+	if (ce_write(NULL, fd, &crc, 4) < 0)
+		return -1;
+	return ce_flush(fd);
+}
+
+static void ondisk_from_directory_entry_partial(struct directory_entry *de,
+						struct ondisk_directory_entry *ondisk)
+{
+	ondisk->foffset   = htonl(de->de_foffset);
+	ondisk->nsubtrees = htonl(de->de_nsubtrees);
+	ondisk->nfiles    = htonl(de->de_nfiles);
+	ondisk->nentries  = htonl(de->de_nentries);
+	hashcpy(ondisk->sha1, de->sha1);
+	ondisk->flags     = htons(de->de_flags);
+	if (de->de_pathlen == 0) {
+		memcpy(ondisk->name, "\0", 1);
+	} else {
+		memcpy(ondisk->name, de->pathname, de->de_pathlen);
+		memcpy(ondisk->name + de->de_pathlen - 1, "/\0", 2);
+	}
+}
+
+static int write_directories_partial(struct directory_entry *de, int fd)
+{
+	int ondisk_size = offsetof(struct ondisk_directory_entry, name);
+	int size = ondisk_size + de->de_pathlen + 1;
+	int i;
+	uint32_t crc;
+	struct ondisk_directory_entry *ondisk;
+
+	if (de->changed) {
+		crc = 0;
+		ondisk = xmalloc(size);
+		ondisk_from_directory_entry_partial(de, ondisk);
+		if (lseek(fd, de->entry_pos, SEEK_SET) < de->entry_pos)
+			die("error directory seeking");;
+		if (ce_write(&crc, fd, ondisk, size) < 0)
+			return -1;
+		crc = htonl(crc);
+		if (ce_write(NULL, fd, &crc, 4) < 0)
+			return -1;
+		free(ondisk);
+		if (ce_flush(fd) < 0)
+			return -1;
+	}
+	for (i = 0; i < de->de_nsubtrees; i++) {
+		if (write_directories_partial(de->sub[i], fd) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+static int write_partial(struct index_state *istate, int fd)
+{
+	write_directories_partial(istate->root_directory, fd);
+
+	return for_each_index_entry(istate, write_ce_if_necessary, &fd);
+}
+
 static int write_index_v5(struct index_state *istate, int newfd)
 {
 	struct cache_header hdr;
@@ -1296,9 +1372,33 @@ static int write_index_v5(struct index_state *istate, int newfd)
 	return ce_flush(newfd);
 }
 
+static int write_index_partial_v5(struct index_state *istate, int newfd)
+{
+	int fd;
+	char *path = get_index_file();
+
+	if (istate->needs_rewrite || istate->cache_nr == 0)
+		return write_index_v5(istate, newfd);
+	if (istate->filter_opts && istate->needs_rewrite)
+		die("BUG: cannot write a partially read index");
+	fd = open(path, O_RDWR, 0666);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			die("no index file exists cannot do a partial write");
+		die_errno("index file opening for writing failed");
+	}
+
+	if (write_partial(istate, fd) < 0)
+		return -1;
+	if (ce_flush(fd) < 0)
+		return -1;
+	return 1;
+}
+
 struct index_ops v5_ops = {
 	match_stat_basic,
 	verify_hdr,
 	read_index_v5,
-	write_index_v5
+	write_index_v5,
+	write_index_partial_v5
 };
diff --git a/read-cache.c b/read-cache.c
index 04430e5..1cad0e2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -32,6 +32,9 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 
 	remove_name_hash(istate, old);
 	set_index_entry(istate, nr, ce);
+	ce->changed = 1;
+	if (ce->entry_pos == 0)
+		istate->needs_rewrite = 1;
 	istate->cache_changed = 1;
 }
 
@@ -494,6 +497,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 
 	record_resolve_undo(istate, ce);
 	remove_name_hash(istate, ce);
+	istate->needs_rewrite = 1;
 	istate->cache_changed = 1;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
@@ -520,6 +524,7 @@ void remove_marked_cache_entries(struct index_state *istate)
 		else
 			ce_array[j++] = ce_array[i];
 	}
+	istate->needs_rewrite = 1;
 	istate->cache_changed = 1;
 	istate->cache_nr = j;
 }
@@ -1024,6 +1029,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 			istate->cache + pos,
 			(istate->cache_nr - pos - 1) * sizeof(ce));
 	set_index_entry(istate, pos, ce);
+	istate->needs_rewrite = 1;
 	istate->cache_changed = 1;
 	return 0;
 }
@@ -1108,6 +1114,8 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	size = ce_size(ce);
 	updated = xmalloc(size);
 	memcpy(updated, ce, size);
+	updated->changed = 1;
+	updated->entry_pos = ce->entry_pos;
 	fill_stat_cache_info(updated, &st);
 	/*
 	 * If ignore_valid is not set, we should leave CE_VALID bit
@@ -1201,6 +1209,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 				 * means the index is not valid anymore.
 				 */
 				ce->ce_flags &= ~CE_VALID;
+				/* TODO: remove this maybe? */
+				istate->needs_rewrite = 1;
 				istate->cache_changed = 1;
 			}
 			if (quiet)
@@ -1241,6 +1251,8 @@ void initialize_index(struct index_state *istate, int version)
 			version = atoi(envversion);
 	}
 	istate->version = version;
+	istate->needs_rewrite = 0;
+	istate->root_directory = NULL;
 	set_istate_ops(istate);
 }
 
@@ -1427,6 +1439,16 @@ int is_index_unborn(struct index_state *istate)
 	return (!istate->cache_nr && !istate->timestamp.sec);
 }
 
+static void free_directory_tree(struct directory_entry *de) {
+	int i;
+
+	if (!de)
+		return;
+	for (i = 0; i < de->de_pathlen; i++)
+		free_directory_tree(de->sub[i]);
+	free(de);
+}
+
 int discard_index(struct index_state *istate)
 {
 	int i;
@@ -1435,6 +1457,7 @@ int discard_index(struct index_state *istate)
 		free(istate->cache[i]);
 	resolve_undo_clear_index(istate);
 	istate->cache_nr = 0;
+	istate->needs_rewrite = 0;
 	istate->cache_changed = 0;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
@@ -1446,6 +1469,8 @@ int discard_index(struct index_state *istate)
 	istate->cache_alloc = 0;
 	istate->ops = NULL;
 	istate->filter_opts = NULL;
+	free_directory_tree(istate->root_directory);
+	istate->root_directory = NULL;
 	return 0;
 }
 
@@ -1491,6 +1516,11 @@ int write_index(struct index_state *istate, int newfd)
 	return istate->ops->write_index(istate, newfd);
 }
 
+int write_index_partial(struct index_state *istate, int newfd)
+{
+	return istate->ops->write_index_partial(istate, newfd);
+}
+
 /*
  * Read the index file that is potentially unmerged into given
  * index_state, dropping any unmerged entries.  Returns true if
diff --git a/read-cache.h b/read-cache.h
index 9d66df6..e7f36ae 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -31,6 +31,7 @@ struct index_ops {
 	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size,
 			  struct filter_opts *opts);
 	int (*write_index)(struct index_state *istate, int newfd);
+	int (*write_index_partial)(struct index_state *istate, int newfd);
 };
 
 extern struct index_ops v2_ops;
diff --git a/resolve-undo.c b/resolve-undo.c
index c09b006..c496c20 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -110,6 +110,7 @@ void resolve_undo_clear_index(struct index_state *istate)
 	string_list_clear(resolve_undo, 1);
 	free(resolve_undo);
 	istate->resolve_undo = NULL;
+	istate->needs_rewrite = 1;
 	istate->cache_changed = 1;
 }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 24/24] perf: add partial writing test
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (22 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
@ 2013-11-27 12:00 ` Thomas Gummerer
  2013-12-09 10:14 ` [PATCH v4 00/24] Index-v5 Thomas Gummerer
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-27 12:00 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, gitster, tr, mhagger, pclouds, robin.rosenberg,
	sunshine, ramsay

Add a test that uses update-index and exercises the partial writing code
path.
---
 t/perf/p0003-index.sh | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/t/perf/p0003-index.sh b/t/perf/p0003-index.sh
index 5360175..d1f590b 100755
--- a/t/perf/p0003-index.sh
+++ b/t/perf/p0003-index.sh
@@ -5,6 +5,7 @@ test_description="Tests index versions [23]/4/5"
 . ./perf-lib.sh
 
 test_perf_large_repo
+test_checkout_worktree
 
 test_expect_success "convert to v3" "
 	git update-index --index-version=2
@@ -15,6 +16,7 @@ test_perf "v[23]: update-index" "
 "
 
 subdir=$(git ls-files | sed 's#/[^/]*$##' | grep -v '^$' | uniq | tail -n 30 | head -1)
+file=$(git ls-files | tail -n 30 | head -1)
 
 test_perf "v[23]: grep nonexistent -- subdir" "
 	test_must_fail git grep nonexistent -- $subdir >/dev/null
@@ -24,6 +26,16 @@ test_perf "v[23]: ls-files -- subdir" "
 	git ls-files $subdir >/dev/null
 "
 
+test_expect_success "v[23] update-index prepare" "
+	echo x >$file
+"
+
+test_perf_cleanup "v[23] update-index" "
+	git update-index $file
+" "
+	git reset
+"
+
 test_expect_success "convert to v4" "
 	git update-index --index-version=4
 "
@@ -40,6 +52,17 @@ test_perf "v4: ls-files -- subdir" "
 	git ls-files $subdir >/dev/null
 "
 
+test_expect_success "v4 update-index prepare" "
+	echo x >$file
+"
+
+test_perf_cleanup "v4 update-index" "
+	git update-index $file
+" "
+	git reset
+"
+
+
 test_expect_success "convert to v5" "
 	git update-index --index-version=5
 "
@@ -60,4 +83,14 @@ test_perf "v5: ls-files -- subdir" "
 	git ls-files $subdir >/dev/null
 "
 
+test_expect_success "v5 update-index prepare" "
+	echo x >$file
+"
+
+test_perf_cleanup "v5 update-index" "
+	git update-index $file
+" "
+	git reset
+"
+
 test_done
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable
  2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
@ 2013-11-27 21:57   ` Eric Sunshine
  2013-11-27 22:08     ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2013-11-27 21:57 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

On Wed, Nov 27, 2013 at 7:00 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Respect a GIT_INDEX_VERSION environment variable, when a new index is
> initialized.  Setting the environment variable will not cause existing
> index files to be converted to another format for additional safety.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> diff --git a/read-cache.c b/read-cache.c
> index 46551af..04430e5 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1233,8 +1233,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>  void initialize_index(struct index_state *istate, int version)
>  {
>         istate->initialized = 1;
> -       if (!version)
> -               version = INDEX_FORMAT_DEFAULT;
> +       if (!version) {
> +               char *envversion = getenv("GIT_INDEX_VERSION");
> +               if (!envversion)
> +                       version = INDEX_FORMAT_DEFAULT;
> +               else
> +                       version = atoi(envversion);

Do you want to check that atoi() returned a valid value and emit a
diagnostic if it did not?

> +       }
>         istate->version = version;
>         set_istate_ops(istate);
>  }
> --
> 1.8.4.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable
  2013-11-27 21:57   ` Eric Sunshine
@ 2013-11-27 22:08     ` Junio C Hamano
  2013-11-28  9:57       ` Thomas Gummerer
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2013-11-27 22:08 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Thomas Gummerer, Git List, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Nov 27, 2013 at 7:00 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> Respect a GIT_INDEX_VERSION environment variable, when a new index is
>> initialized.  Setting the environment variable will not cause existing
>> index files to be converted to another format for additional safety.
>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>> diff --git a/read-cache.c b/read-cache.c
>> index 46551af..04430e5 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1233,8 +1233,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>>  void initialize_index(struct index_state *istate, int version)
>>  {
>>         istate->initialized = 1;
>> -       if (!version)
>> -               version = INDEX_FORMAT_DEFAULT;
>> +       if (!version) {
>> +               char *envversion = getenv("GIT_INDEX_VERSION");
>> +               if (!envversion)
>> +                       version = INDEX_FORMAT_DEFAULT;
>> +               else
>> +                       version = atoi(envversion);
>
> Do you want to check that atoi() returned a valid value and emit a
> diagnostic if it did not?


Good eyes.

We use strtoul() for this kind of thing instead of atoi() for format
checking.  The code also needs to make sure that the value obtained
thusly are among the versions that are supported.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable
  2013-11-27 22:08     ` Junio C Hamano
@ 2013-11-28  9:57       ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-28  9:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Eric Sunshine, Git List, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

On 11/27, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
> > On Wed, Nov 27, 2013 at 7:00 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >> Respect a GIT_INDEX_VERSION environment variable, when a new index is
> >> initialized.  Setting the environment variable will not cause existing
> >> index files to be converted to another format for additional safety.
> >>
> >> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> >> ---
> >> diff --git a/read-cache.c b/read-cache.c
> >> index 46551af..04430e5 100644
> >> --- a/read-cache.c
> >> +++ b/read-cache.c
> >> @@ -1233,8 +1233,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
> >>  void initialize_index(struct index_state *istate, int version)
> >>  {
> >>         istate->initialized = 1;
> >> -       if (!version)
> >> -               version = INDEX_FORMAT_DEFAULT;
> >> +       if (!version) {
> >> +               char *envversion = getenv("GIT_INDEX_VERSION");
> >> +               if (!envversion)
> >> +                       version = INDEX_FORMAT_DEFAULT;
> >> +               else
> >> +                       version = atoi(envversion);
> >
> > Do you want to check that atoi() returned a valid value and emit a
> > diagnostic if it did not?
> 
> 
> Good eyes.
> 
> We use strtoul() for this kind of thing instead of atoi() for format
> checking.  The code also needs to make sure that the value obtained
> thusly are among the versions that are supported.
> 
> Thanks.

Thanks both.  Will use strtoul and check the value in the re-roll.

-- 
Thomas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 12/24] read-cache: read index-v5
  2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
@ 2013-11-30  9:17   ` Duy Nguyen
  2013-11-30 10:40     ` Thomas Gummerer
  2013-11-30 12:19   ` Antoine Pelisse
  2013-11-30 15:26   ` Antoine Pelisse
  2 siblings, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2013-11-30  9:17 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Junio C Hamano, tr, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Wed, Nov 27, 2013 at 7:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> --- a/cache.h
> +++ b/cache.h
> @@ -132,11 +141,17 @@ struct cache_entry {
>         char name[FLEX_ARRAY]; /* more */
>  };
>
> +#define CE_NAMEMASK  (0x0fff)

CE_NAMEMASK is redefined in read-cache-v2.c in "read-cache: move index
v2 specific functions to their own file". My gcc is smart enough to
see the two defines are about the same value and does not warn me. But
we should remove one (likely this one as I see no use of this macro
outside read-cache-v2.c)

>  #define CE_STAGEMASK (0x3000)
>  #define CE_EXTENDED  (0x4000)
>  #define CE_VALID     (0x8000)
> +#define CE_SMUDGED   (0x0400) /* index v5 only flag */
>  #define CE_STAGESHIFT 12
>
> +#define CONFLICT_CONFLICTED (0x8000)
> +#define CONFLICT_STAGESHIFT 13
> +#define CONFLICT_STAGEMASK (0x6000)
> +
>  /*
>   * Range 0xFFFF0000 in ce_flags is divided into
>   * two parts: in-memory flags and on-disk ones.

> diff --git a/read-cache-v5.c b/read-cache-v5.c
> new file mode 100644
> index 0000000..9d8c8f0
> --- /dev/null
> +++ b/read-cache-v5.c
> +static int read_index_v5(struct index_state *istate, void *mmap,
> +                        unsigned long mmap_size, struct filter_opts *opts)
> +{
> +       unsigned int entry_offset, foffsetblock, nr = 0, *extoffsets;
> +       unsigned int dir_offset, dir_table_offset;
> +       int need_root = 0, i;
> +       uint32_t *offset;
> +       struct directory_entry *root_directory, *de, *last_de;
> +       const char **paths = NULL;
> +       struct pathspec adjusted_pathspec;
> +       struct cache_header *hdr;
> +       struct cache_header_v5 *hdr_v5;
> +
> +       hdr = mmap;
> +       hdr_v5 = ptr_add(mmap, sizeof(*hdr));
> +       istate->cache_alloc = alloc_nr(ntohl(hdr->hdr_entries));
> +       istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
> +       extoffsets = xcalloc(ntohl(hdr_v5->hdr_nextension), sizeof(int));
> +       for (i = 0; i < ntohl(hdr_v5->hdr_nextension); i++) {
> +               offset = ptr_add(mmap, sizeof(*hdr) + sizeof(*hdr_v5));
> +               extoffsets[i] = htonl(*offset);
> +       }
> +
> +       /* Skip size of the header + crc sum + size of offsets to extensions + size of offsets */
> +       dir_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4
> +               + (ntohl(hdr_v5->hdr_ndir) + 1) * 4;
> +       dir_table_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4;
> +       root_directory = read_directories(&dir_offset, &dir_table_offset,
> +                                         mmap, mmap_size);
> +
> +       entry_offset = ntohl(hdr_v5->hdr_fblockoffset);
> +       foffsetblock = dir_offset;
> +
> +       if (opts && opts->pathspec && opts->pathspec->nr) {
> +               paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
> +               paths[opts->pathspec->nr] = NULL;

Put this statement here

GUARD_PATHSPEC(opts->pathspec,
      PATHSPEC_FROMTOP |
      PATHSPEC_MAXDEPTH |
      PATHSPEC_LITERAL |
      PATHSPEC_GLOB |
      PATHSPEC_ICASE);

This says the mentioned magic is safe in this code. New magic may or
may not be and needs to be checked (soonest by me, I'm going to add
negative pathspec and I'll need to look into how it should be handled
in this code block).

> +               for (i = 0; i < opts->pathspec->nr; i++) {
> +                       char *super = strdup(opts->pathspec->items[i].match);
> +                       int len = strlen(super);

You should only check as far as items[i].nowildcard_len, not strlen().
The rest could be wildcards and stuff and not so reliable.

> +                       while (len && super[len - 1] == '/' && super[len - 2] == '/')
> +                               super[--len] = '\0'; /* strip all but one trailing slash */
> +                       while (len && super[--len] != '/')
> +                               ; /* scan backwards to next / */
> +                       if (len >= 0)
> +                               super[len--] = '\0';
> +                       if (len <= 0) {
> +                               need_root = 1;
> +                               break;
> +                       }
> +                       paths[i] = super;
> +               }

And maybe put the comment "FIXME: consider merging this code with
create_simplify() in dir.c" somewhere. It's for me to look for things
to do when I'm bored ;-)

> +       }
> +
> +       if (!need_root)
> +               parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);

I would go with PATHSPEC_PREFER_FULL instead of _CWD as it's safer.
Looking only at this function without caller context, it's hard to say
if _CWD is the right choice.

> +
> +       de = root_directory;
> +       last_de = de;

This statement is redundant. last_de is only used in one code block
below and it's always re-initialized before entering the loop to skip
subdirs.

> +       while (de) {
> +               if (need_root ||
> +                   match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
> +                       if (read_entries(istate, de, entry_offset,
> +                                        mmap, mmap_size, &nr,
> +                                        foffsetblock) < 0)
> +                               return -1;
> +               } else {
> +                       last_de = de;
> +                       for (i = 0; i < de->de_nsubtrees; i++) {
> +                               de->sub[i]->next = last_de->next;
> +                               last_de->next = de->sub[i];
> +                               last_de = last_de->next;
> +                       }
> +               }
> +               de = de->next;
> +       }
> +       free_directory_tree(root_directory);
> +       istate->cache_nr = nr;
> +       return 0;
> +}
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 09/24] ls-files.c: use index api
  2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
@ 2013-11-30  9:17   ` Duy Nguyen
  2013-11-30 10:30     ` Thomas Gummerer
  2013-11-30 15:39   ` Antoine Pelisse
  1 sibling, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2013-11-30  9:17 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Junio C Hamano, tr, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Wed, Nov 27, 2013 at 7:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> @@ -447,6 +463,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>         struct dir_struct dir;
>         struct exclude_list *el;
>         struct string_list exclude_list = STRING_LIST_INIT_NODUP;
> +       struct filter_opts *opts = xmalloc(sizeof(*opts));
>         struct option builtin_ls_files_options[] = {
>                 { OPTION_CALLBACK, 'z', NULL, NULL, NULL,
>                         N_("paths are separated with NUL character"),
> @@ -512,9 +529,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                 prefix_len = strlen(prefix);
>         git_config(git_default_config, NULL);
>
> -       if (read_cache() < 0)
> -               die("index file corrupt");
> -
>         argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
>                         ls_files_usage, 0);
>         el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
> @@ -550,6 +564,24 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                        PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>                        prefix, argv);
>
> +       if (!with_tree && !needs_trailing_slash_stripped()) {
> +               memset(opts, 0, sizeof(*opts));
> +               opts->pathspec = &pathspec;
> +               opts->read_staged = 1;
> +               if (show_resolve_undo)
> +                       opts->read_resolve_undo = 1;
> +               if (read_cache_filtered(opts) < 0)
> +                       die("index file corrupt");
> +       } else {
> +               if (read_cache() < 0)
> +                       die("index file corrupt");
> +               parse_pathspec(&pathspec, 0,
> +                              PATHSPEC_PREFER_CWD |
> +                              PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
> +                              prefix, argv);

So we ran parse_pathspec() once (not shown in the context), if found
trailing slashes, we read full cache and rerun parse_pathspec()
because the index is not loaded on the first run.

This is fine. Just a note for future improvement: as _SLASH_CHEAP only
needs to look at a few entries with cache_name_pos(), we could take
advantage of v5 to peek individual entries (or in v2, load full cache
first). Nothing needs to be done now, I think we have not decided
whether to combine _SLASH_CHEAP and _SLASH_EXPENSIVE into one.

> +       }
> +
>         /* Find common prefix for all pathspec's */
>         max_prefix = common_prefix(&pathspec);
>         max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 23/24] POC for partial writing
  2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
@ 2013-11-30  9:58   ` Duy Nguyen
  2013-11-30 10:50     ` Thomas Gummerer
  0 siblings, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2013-11-30  9:58 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Junio C Hamano, tr, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Wed, Nov 27, 2013 at 7:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> This makes update-index use both partial reading and partial writing.
> Partial reading is only used no option other than the paths is passed to
> the command.
>
> This passes the test suite,

Just checking, the test suite was run with TEST_GIT_INDEX_VERSION=5, right?

> but doesn't behave correctly when a write
> fails.  A log should be written to the lock file, in order to be able to
> recover if a write fails.

>From the API point of view this looks nice (you should have hidden
needs_write = 1 in cache_invalidate_path and change_cache_version
though)  We could support partial file removal too by marking removed
files "removed", but that impacts the reading code and may have bad
interaction with cache_invalidate_path/needs_rewrite. Probably not
worth the effort until someone shows us they remove stuff often.

> ---
>  builtin/update-index.c |  43 +++++++++++---
>  cache-tree.c           |  13 +++++
>  cache-tree.h           |   1 +
>  cache.h                |  27 ++++++++-
>  lockfile.c             |   2 +-
>  read-cache-v2.c        |   2 +
>  read-cache-v5.c        | 154 ++++++++++++++++++++++++++++++++++++++++---------
>  read-cache.c           |  30 ++++++++++
>  read-cache.h           |   1 +
>  resolve-undo.c         |   1 +
>  10 files changed, 237 insertions(+), 37 deletions(-)
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index 8b3f7a0..69f0949 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -56,6 +56,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
>                 else
>                         active_cache[pos]->ce_flags &= ~flag;
>                 cache_tree_invalidate_path(active_cache_tree, path);
> +               the_index.needs_rewrite = 1;
>                 active_cache_changed = 1;
>                 return 0;
>         }
> @@ -99,6 +100,8 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len
>         memcpy(ce->name, path, len);
>         ce->ce_flags = create_ce_flags(0);
>         ce->ce_namelen = len;
> +       if (old)
> +               ce->entry_pos = old->entry_pos;
>         fill_stat_cache_info(ce, st);
>         ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
>
> @@ -268,6 +271,7 @@ static void chmod_path(int flip, const char *path)
>                 goto fail;
>         }
>         cache_tree_invalidate_path(active_cache_tree, path);
> +       the_index.needs_rewrite = 1;
>         active_cache_changed = 1;
>         report("chmod %cx '%s'", flip, path);
>         return;
> @@ -706,15 +710,18 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx,
>
>  int cmd_update_index(int argc, const char **argv, const char *prefix)
>  {
> -       int newfd, entries, has_errors = 0, line_termination = '\n';
> +       int newfd, has_errors = 0, line_termination = '\n';
>         int read_from_stdin = 0;
>         int prefix_length = prefix ? strlen(prefix) : 0;
>         int preferred_index_format = 0;
>         char set_executable_bit = 0;
>         struct refresh_params refresh_args = {0, &has_errors};
>         int lock_error = 0;
> +       struct filter_opts opts;
> +       struct pathspec pathspec;
>         struct lock_file *lock_file;
>         struct parse_opt_ctx_t ctx;
> +       int i, needs_full_read = 0;
>         int parseopt_state = PARSE_OPT_UNKNOWN;
>         struct option options[] = {
>                 OPT_BIT('q', NULL, &refresh_args.flags,
> @@ -810,9 +817,23 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>         if (newfd < 0)
>                 lock_error = errno;
>
> -       entries = read_cache();
> -       if (entries < 0)
> -               die("cache corrupted");
> +       for (i = 0; i < argc; i++) {
> +               if (!prefixcmp(argv[i], "--"))
> +                       needs_full_read = 1;
> +       }
> +       if (!needs_full_read) {
> +               memset(&opts, 0, sizeof(struct filter_opts));
> +               parse_pathspec(&pathspec, 0,
> +                              PATHSPEC_PREFER_CWD |
> +                              PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
> +                              prefix, argv + 1);
> +               opts.pathspec = &pathspec;
> +               if (read_cache_filtered(&opts) < 0)
> +                       die("cache corrupted");
> +       } else {
> +               if (read_cache() < 0)
> +                       die("cache corrupted");
> +       }
>
>         /*
>          * Custom copy of parse_options() because we want to handle
> @@ -862,6 +883,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>                             preferred_index_format,
>                             INDEX_FORMAT_LB, INDEX_FORMAT_UB);
>
> +               the_index.needs_rewrite = 1;
>                 active_cache_changed = 1;
>                 change_cache_version(preferred_index_format);
>         }
> @@ -890,17 +912,22 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>         }
>
>         if (active_cache_changed) {
> +               int r;
>                 if (newfd < 0) {
>                         if (refresh_args.flags & REFRESH_QUIET)
>                                 exit(128);
>                         unable_to_lock_index_die(get_index_file(), lock_error);
>                 }
> -               if (write_cache(newfd, active_cache, active_nr) ||
> -                   commit_locked_index(lock_file))
> +               r = write_cache_partial(newfd);
> +               if (r < 0)
>                         die("Unable to write new index file");
> +               else if (r == 0)
> +                       commit_lock_file(lock_file);
> +               else
> +                       remove_lock_file();
> +       } else {
> +               rollback_lock_file(lock_file);
>         }
>
> -       rollback_lock_file(lock_file);
> -
>         return has_errors ? 1 : 0;
>  }
> diff --git a/cache-tree.c b/cache-tree.c
> index 1209732..a3d18bb 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -123,6 +123,15 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
>                 return;
>         slash = strchr(path, '/');
>         it->entry_count = -1;
> +       /*
> +        * Mark the cache_tree directory entry as invalid too. The
> +        * entry_count defines if the tree is valid, so we don't need
> +        * to reset any other field.
> +        */
> +       if (it->de_ref) {
> +               it->de_ref->de_nentries = -1;
> +               it->de_ref->changed = 1;
> +       }
>         if (!slash) {
>                 int pos;
>                 namelen = strlen(path);
> @@ -140,6 +149,10 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
>                                 sizeof(struct cache_tree_sub *) *
>                                 (it->subtree_nr - pos - 1));
>                         it->subtree_nr--;
> +                       if (it->de_ref) {
> +                               it->de_ref->de_nsubtrees--;
> +                               it->de_ref->changed = 1;
> +                       }
>                 }
>                 return;
>         }
> diff --git a/cache-tree.h b/cache-tree.h
> index 9818926..eaf14a9 100644
> --- a/cache-tree.h
> +++ b/cache-tree.h
> @@ -18,6 +18,7 @@ struct cache_tree {
>         unsigned char sha1[20];
>         int subtree_nr;
>         int subtree_alloc;
> +       struct directory_entry *de_ref;
>         struct cache_tree_sub **down;
>  };
>
> diff --git a/cache.h b/cache.h
> index 71b98cf..1a634dc 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -137,11 +137,31 @@ struct cache_entry {
>         unsigned int ce_namelen;
>         unsigned char sha1[20];
>         uint32_t ce_stat_crc;
> +       unsigned int entry_pos;
> +       unsigned int changed;
>         struct cache_entry *next; /* used by name_hash */
>         struct cache_entry *next_ce;
>         char name[FLEX_ARRAY]; /* more */
>  };
>
> +struct directory_entry {
> +       struct directory_entry **sub;
> +       struct directory_entry *next;
> +       struct directory_entry *next_hash;
> +       struct cache_entry *ce;
> +       struct cache_entry *ce_last;
> +       uint32_t de_foffset;
> +       uint32_t de_nsubtrees;
> +       uint32_t de_nfiles;
> +       uint32_t de_nentries;
> +       unsigned char sha1[20];
> +       uint16_t de_flags;
> +       uint32_t de_pathlen;
> +       uint32_t entry_pos;
> +       unsigned int changed;
> +       char pathname[FLEX_ARRAY];
> +};
> +
>  #define CE_NAMEMASK  (0x0fff)
>  #define CE_STAGEMASK (0x3000)
>  #define CE_EXTENDED  (0x4000)
> @@ -317,13 +337,15 @@ struct filter_opts {
>
>  struct index_state {
>         struct cache_entry **cache;
> +       struct directory_entry *root_directory;
>         unsigned int version;
>         unsigned int cache_nr, cache_alloc, cache_changed;
>         struct string_list *resolve_undo;
>         struct cache_tree *cache_tree;
>         struct cache_time timestamp;
>         unsigned name_hash_initialized : 1,
> -                initialized : 1;
> +                initialized : 1,
> +                needs_rewrite : 1;
>         struct hash_table name_hash;
>         struct hash_table dir_hash;
>         struct index_ops *ops;
> @@ -353,6 +375,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define is_cache_unborn() is_index_unborn(&the_index)
>  #define read_cache_unmerged() read_index_unmerged(&the_index)
>  #define write_cache(newfd, cache, entries) write_index(&the_index, (newfd))
> +#define write_cache_partial(newfd) write_index_partial(&the_index, (newfd))
>  #define discard_cache() discard_index(&the_index)
>  #define unmerged_cache() unmerged_index(&the_index)
>  #define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
> @@ -529,6 +552,7 @@ extern int read_index_from(struct index_state *, const char *path);
>  extern int is_index_unborn(struct index_state *);
>  extern int read_index_unmerged(struct index_state *);
>  extern int write_index(struct index_state *, int newfd);
> +extern int write_index_partial(struct index_state *, int newfd);
>  extern int discard_index(struct index_state *);
>  extern int unmerged_index(const struct index_state *);
>  extern int verify_path(const char *path);
> @@ -613,6 +637,7 @@ extern NORETURN void unable_to_lock_index_die(const char *path, int err);
>  extern int hold_lock_file_for_update(struct lock_file *, const char *path, int);
>  extern int hold_lock_file_for_append(struct lock_file *, const char *path, int);
>  extern int commit_lock_file(struct lock_file *);
> +extern void remove_lock_file(void);
>  extern void update_index_if_able(struct index_state *, struct lock_file *);
>
>  extern int hold_locked_index(struct lock_file *, int);
> diff --git a/lockfile.c b/lockfile.c
> index 8fbcb6a..c150e5c 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -7,7 +7,7 @@
>  static struct lock_file *lock_file_list;
>  static const char *alternate_index_output;
>
> -static void remove_lock_file(void)
> +void remove_lock_file(void)
>  {
>         pid_t me = getpid();
>
> diff --git a/read-cache-v2.c b/read-cache-v2.c
> index f884c10..1fec892 100644
> --- a/read-cache-v2.c
> +++ b/read-cache-v2.c
> @@ -555,5 +555,7 @@ struct index_ops v2_ops = {
>         match_stat_basic,
>         verify_hdr,
>         read_index_v2,
> +       write_index_v2,
> +       /* Partial writing is the same as writing the full index for v2 */
>         write_index_v2
>  };
> diff --git a/read-cache-v5.c b/read-cache-v5.c
> index a5e9b5a..13436a3 100644
> --- a/read-cache-v5.c
> +++ b/read-cache-v5.c
> @@ -20,22 +20,6 @@ struct extension_header {
>         uint32_t crc;
>  };
>
> -struct directory_entry {
> -       struct directory_entry **sub;
> -       struct directory_entry *next;
> -       struct directory_entry *next_hash;
> -       struct cache_entry *ce;
> -       struct cache_entry *ce_last;
> -       uint32_t de_foffset;
> -       uint32_t de_nsubtrees;
> -       uint32_t de_nfiles;
> -       uint32_t de_nentries;
> -       unsigned char sha1[20];
> -       uint16_t de_flags;
> -       uint32_t de_pathlen;
> -       char pathname[FLEX_ARRAY];
> -};
> -
>  struct conflict_part {
>         struct conflict_part *next;
>         uint16_t flags;
> @@ -246,7 +230,7 @@ static struct directory_entry *read_directories(unsigned int *dir_offset,
>                 offsetof(struct ondisk_directory_entry, name) - 5;
>         disk_de = ptr_add(mmap, *dir_offset);
>         de = directory_entry_from_ondisk(disk_de, len);
> -
> +       de->entry_pos = *dir_offset;
>         data_len = len + 1 + offsetof(struct ondisk_directory_entry, name);
>         filecrc = ptr_add(mmap, *dir_offset + data_len);
>         if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
> @@ -281,6 +265,7 @@ static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
>         entry_offset = first_entry_offset + ntoh_l(*beginning);
>         disk_ce = ptr_add(mmap, entry_offset);
>         *ce = cache_entry_from_ondisk(disk_ce, pathname, len, pathlen);
> +       (*ce)->entry_pos = entry_offset;
>         filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
>         if (!check_crc32(0,
>                 ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
> @@ -439,6 +424,7 @@ static struct cache_tree *convert_one(struct directory_entry *de)
>
>         it = cache_tree();
>         it->entry_count = de->de_nentries;
> +       it->de_ref = de;
>         if (0 <= it->entry_count)
>                 hashcpy(it->sha1, de->sha1);
>
> @@ -523,14 +509,6 @@ static int read_entries(struct index_state *istate, struct directory_entry *de,
>         return 0;
>  }
>
> -static void free_directory_tree(struct directory_entry *de) {
> -       int i;
> -
> -       for (i = 0; i < de->de_pathlen; i++)
> -               free_directory_tree(de->sub[i]);
> -       free(de);
> -}
> -
>  /*
>   * Read an index-v5 file filtered by the filter_opts.   If opts is NULL,
>   * everything will be read.
> @@ -626,7 +604,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
>                 }
>         }
>         istate->cache_tree = cache_tree_convert_v5(root_directory);
> -       free_directory_tree(root_directory);
> +       istate->root_directory = root_directory;
>         istate->cache_nr = nr;
>         return 0;
>  }
> @@ -696,6 +674,7 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
>          * that hasn't changed checking the sha1.
>          */
>         ce->ce_flags |= CE_SMUDGED;
> +       ce->changed = 1;
>  }
>
>  static char *super_directory(char *filename)
> @@ -1231,6 +1210,103 @@ static int write_resolve_undo(struct index_state *istate,
>         return 0;
>  }
>
> +static int write_ce_if_necessary(struct cache_entry *ce, void *cb_data)
> +{
> +       int *fdx = cb_data, pathlen, size;
> +       int fd = *fdx;
> +       char *dir;
> +       struct ondisk_cache_entry *ondisk;
> +       uint32_t crc;
> +
> +       assert(ce->entry_pos != 0);
> +       /* TODO I'm just using the_index out of lazyness here */
> +       if (!ce_uptodate(ce) && is_racy_timestamp(&the_index, ce))
> +               ce_smudge_racily_clean_entry(ce);
> +       if (!ce->changed)
> +               return 0;
> +       if (is_null_sha1(ce->sha1)) {
> +               static const char msg[] = "cache entry has null sha1: %s";
> +               static int allow = -1;
> +
> +               if (allow < 0)
> +                       allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);
> +               if (allow)
> +                       warning(msg, ce->name);
> +               else
> +                       return error(msg, ce->name);
> +       }
> +       dir = super_directory(ce->name);
> +       pathlen = dir ? strlen(dir) + 1 : 0;
> +       size = offsetof(struct ondisk_cache_entry, name) +
> +               ce_namelen(ce) - pathlen + 1;
> +       ondisk = xmalloc(size);
> +
> +       crc = 0;
> +       ondisk_from_cache_entry(ce, ondisk, pathlen);
> +       if (lseek(fd, ce->entry_pos, SEEK_SET) < ce->entry_pos)
> +               die("eror ce seeking");
> +       if (ce_write(&crc, fd, ondisk, size) < 0)
> +               return -1;
> +       crc = htonl(crc);
> +       if (ce_write(NULL, fd, &crc, 4) < 0)
> +               return -1;
> +       return ce_flush(fd);
> +}
> +
> +static void ondisk_from_directory_entry_partial(struct directory_entry *de,
> +                                               struct ondisk_directory_entry *ondisk)
> +{
> +       ondisk->foffset   = htonl(de->de_foffset);
> +       ondisk->nsubtrees = htonl(de->de_nsubtrees);
> +       ondisk->nfiles    = htonl(de->de_nfiles);
> +       ondisk->nentries  = htonl(de->de_nentries);
> +       hashcpy(ondisk->sha1, de->sha1);
> +       ondisk->flags     = htons(de->de_flags);
> +       if (de->de_pathlen == 0) {
> +               memcpy(ondisk->name, "\0", 1);
> +       } else {
> +               memcpy(ondisk->name, de->pathname, de->de_pathlen);
> +               memcpy(ondisk->name + de->de_pathlen - 1, "/\0", 2);
> +       }
> +}
> +
> +static int write_directories_partial(struct directory_entry *de, int fd)
> +{
> +       int ondisk_size = offsetof(struct ondisk_directory_entry, name);
> +       int size = ondisk_size + de->de_pathlen + 1;
> +       int i;
> +       uint32_t crc;
> +       struct ondisk_directory_entry *ondisk;
> +
> +       if (de->changed) {
> +               crc = 0;
> +               ondisk = xmalloc(size);
> +               ondisk_from_directory_entry_partial(de, ondisk);
> +               if (lseek(fd, de->entry_pos, SEEK_SET) < de->entry_pos)
> +                       die("error directory seeking");;
> +               if (ce_write(&crc, fd, ondisk, size) < 0)
> +                       return -1;
> +               crc = htonl(crc);
> +               if (ce_write(NULL, fd, &crc, 4) < 0)
> +                       return -1;
> +               free(ondisk);
> +               if (ce_flush(fd) < 0)
> +                       return -1;
> +       }
> +       for (i = 0; i < de->de_nsubtrees; i++) {
> +               if (write_directories_partial(de->sub[i], fd) < 0)
> +                       return -1;
> +       }
> +       return 0;
> +}
> +
> +static int write_partial(struct index_state *istate, int fd)
> +{
> +       write_directories_partial(istate->root_directory, fd);
> +
> +       return for_each_index_entry(istate, write_ce_if_necessary, &fd);
> +}
> +
>  static int write_index_v5(struct index_state *istate, int newfd)
>  {
>         struct cache_header hdr;
> @@ -1296,9 +1372,33 @@ static int write_index_v5(struct index_state *istate, int newfd)
>         return ce_flush(newfd);
>  }
>
> +static int write_index_partial_v5(struct index_state *istate, int newfd)
> +{
> +       int fd;
> +       char *path = get_index_file();
> +
> +       if (istate->needs_rewrite || istate->cache_nr == 0)
> +               return write_index_v5(istate, newfd);
> +       if (istate->filter_opts && istate->needs_rewrite)
> +               die("BUG: cannot write a partially read index");
> +       fd = open(path, O_RDWR, 0666);
> +       if (fd < 0) {
> +               if (errno == ENOENT)
> +                       die("no index file exists cannot do a partial write");
> +               die_errno("index file opening for writing failed");
> +       }
> +
> +       if (write_partial(istate, fd) < 0)
> +               return -1;
> +       if (ce_flush(fd) < 0)
> +               return -1;
> +       return 1;
> +}
> +
>  struct index_ops v5_ops = {
>         match_stat_basic,
>         verify_hdr,
>         read_index_v5,
> -       write_index_v5
> +       write_index_v5,
> +       write_index_partial_v5
>  };
> diff --git a/read-cache.c b/read-cache.c
> index 04430e5..1cad0e2 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -32,6 +32,9 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
>
>         remove_name_hash(istate, old);
>         set_index_entry(istate, nr, ce);
> +       ce->changed = 1;
> +       if (ce->entry_pos == 0)
> +               istate->needs_rewrite = 1;
>         istate->cache_changed = 1;
>  }
>
> @@ -494,6 +497,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
>
>         record_resolve_undo(istate, ce);
>         remove_name_hash(istate, ce);
> +       istate->needs_rewrite = 1;
>         istate->cache_changed = 1;
>         istate->cache_nr--;
>         if (pos >= istate->cache_nr)
> @@ -520,6 +524,7 @@ void remove_marked_cache_entries(struct index_state *istate)
>                 else
>                         ce_array[j++] = ce_array[i];
>         }
> +       istate->needs_rewrite = 1;
>         istate->cache_changed = 1;
>         istate->cache_nr = j;
>  }
> @@ -1024,6 +1029,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
>                         istate->cache + pos,
>                         (istate->cache_nr - pos - 1) * sizeof(ce));
>         set_index_entry(istate, pos, ce);
> +       istate->needs_rewrite = 1;
>         istate->cache_changed = 1;
>         return 0;
>  }
> @@ -1108,6 +1114,8 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
>         size = ce_size(ce);
>         updated = xmalloc(size);
>         memcpy(updated, ce, size);
> +       updated->changed = 1;
> +       updated->entry_pos = ce->entry_pos;
>         fill_stat_cache_info(updated, &st);
>         /*
>          * If ignore_valid is not set, we should leave CE_VALID bit
> @@ -1201,6 +1209,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>                                  * means the index is not valid anymore.
>                                  */
>                                 ce->ce_flags &= ~CE_VALID;
> +                               /* TODO: remove this maybe? */
> +                               istate->needs_rewrite = 1;
>                                 istate->cache_changed = 1;
>                         }
>                         if (quiet)
> @@ -1241,6 +1251,8 @@ void initialize_index(struct index_state *istate, int version)
>                         version = atoi(envversion);
>         }
>         istate->version = version;
> +       istate->needs_rewrite = 0;
> +       istate->root_directory = NULL;
>         set_istate_ops(istate);
>  }
>
> @@ -1427,6 +1439,16 @@ int is_index_unborn(struct index_state *istate)
>         return (!istate->cache_nr && !istate->timestamp.sec);
>  }
>
> +static void free_directory_tree(struct directory_entry *de) {
> +       int i;
> +
> +       if (!de)
> +               return;
> +       for (i = 0; i < de->de_pathlen; i++)
> +               free_directory_tree(de->sub[i]);
> +       free(de);
> +}
> +
>  int discard_index(struct index_state *istate)
>  {
>         int i;
> @@ -1435,6 +1457,7 @@ int discard_index(struct index_state *istate)
>                 free(istate->cache[i]);
>         resolve_undo_clear_index(istate);
>         istate->cache_nr = 0;
> +       istate->needs_rewrite = 0;
>         istate->cache_changed = 0;
>         istate->timestamp.sec = 0;
>         istate->timestamp.nsec = 0;
> @@ -1446,6 +1469,8 @@ int discard_index(struct index_state *istate)
>         istate->cache_alloc = 0;
>         istate->ops = NULL;
>         istate->filter_opts = NULL;
> +       free_directory_tree(istate->root_directory);
> +       istate->root_directory = NULL;
>         return 0;
>  }
>
> @@ -1491,6 +1516,11 @@ int write_index(struct index_state *istate, int newfd)
>         return istate->ops->write_index(istate, newfd);
>  }
>
> +int write_index_partial(struct index_state *istate, int newfd)
> +{
> +       return istate->ops->write_index_partial(istate, newfd);
> +}
> +
>  /*
>   * Read the index file that is potentially unmerged into given
>   * index_state, dropping any unmerged entries.  Returns true if
> diff --git a/read-cache.h b/read-cache.h
> index 9d66df6..e7f36ae 100644
> --- a/read-cache.h
> +++ b/read-cache.h
> @@ -31,6 +31,7 @@ struct index_ops {
>         int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size,
>                           struct filter_opts *opts);
>         int (*write_index)(struct index_state *istate, int newfd);
> +       int (*write_index_partial)(struct index_state *istate, int newfd);
>  };
>
>  extern struct index_ops v2_ops;
> diff --git a/resolve-undo.c b/resolve-undo.c
> index c09b006..c496c20 100644
> --- a/resolve-undo.c
> +++ b/resolve-undo.c
> @@ -110,6 +110,7 @@ void resolve_undo_clear_index(struct index_state *istate)
>         string_list_clear(resolve_undo, 1);
>         free(resolve_undo);
>         istate->resolve_undo = NULL;
> +       istate->needs_rewrite = 1;
>         istate->cache_changed = 1;
>  }
>
> --
> 1.8.4.2
>



-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 09/24] ls-files.c: use index api
  2013-11-30  9:17   ` Duy Nguyen
@ 2013-11-30 10:30     ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-30 10:30 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano, tr, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Nov 27, 2013 at 7:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> @@ -447,6 +463,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>>         struct dir_struct dir;
>>         struct exclude_list *el;
>>         struct string_list exclude_list = STRING_LIST_INIT_NODUP;
>> +       struct filter_opts *opts = xmalloc(sizeof(*opts));
>>         struct option builtin_ls_files_options[] = {
>>                 { OPTION_CALLBACK, 'z', NULL, NULL, NULL,
>>                         N_("paths are separated with NUL character"),
>> @@ -512,9 +529,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>>                 prefix_len = strlen(prefix);
>>         git_config(git_default_config, NULL);
>>
>> -       if (read_cache() < 0)
>> -               die("index file corrupt");
>> -
>>         argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
>>                         ls_files_usage, 0);
>>         el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
>> @@ -550,6 +564,24 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>>                        PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>>                        prefix, argv);
>>
>> +       if (!with_tree && !needs_trailing_slash_stripped()) {
>> +               memset(opts, 0, sizeof(*opts));
>> +               opts->pathspec = &pathspec;
>> +               opts->read_staged = 1;
>> +               if (show_resolve_undo)
>> +                       opts->read_resolve_undo = 1;
>> +               if (read_cache_filtered(opts) < 0)
>> +                       die("index file corrupt");
>> +       } else {
>> +               if (read_cache() < 0)
>> +                       die("index file corrupt");
>> +               parse_pathspec(&pathspec, 0,
>> +                              PATHSPEC_PREFER_CWD |
>> +                              PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>> +                              prefix, argv);
>
> So we ran parse_pathspec() once (not shown in the context), if found
> trailing slashes, we read full cache and rerun parse_pathspec()
> because the index is not loaded on the first run.
>
> This is fine. Just a note for future improvement: as _SLASH_CHEAP only
> needs to look at a few entries with cache_name_pos(), we could take
> advantage of v5 to peek individual entries (or in v2, load full cache
> first). Nothing needs to be done now, I think we have not decided
> whether to combine _SLASH_CHEAP and _SLASH_EXPENSIVE into one.

Yes that makes sense.  Adding the ability to search for path entries
without reading the whole or part of the index was something I was
thinking about, but didn't have time to do so yet.  I'll add this to my
list of possible future improvements.

>> +       }
>> +
>>         /* Find common prefix for all pathspec's */
>>         max_prefix = common_prefix(&pathspec);
>>         max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
> -- 
> Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 12/24] read-cache: read index-v5
  2013-11-30  9:17   ` Duy Nguyen
@ 2013-11-30 10:40     ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-30 10:40 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano, tr, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Nov 27, 2013 at 7:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -132,11 +141,17 @@ struct cache_entry {
>>         char name[FLEX_ARRAY]; /* more */
>>  };
>>
>> +#define CE_NAMEMASK  (0x0fff)
>
> CE_NAMEMASK is redefined in read-cache-v2.c in "read-cache: move index
> v2 specific functions to their own file". My gcc is smart enough to
> see the two defines are about the same value and does not warn me. But
> we should remove one (likely this one as I see no use of this macro
> outside read-cache-v2.c)

Thanks for catching that, there's no need to have it here.  I'll remove
it in the re-roll.

>>  #define CE_STAGEMASK (0x3000)
>>  #define CE_EXTENDED  (0x4000)
>>  #define CE_VALID     (0x8000)
>> +#define CE_SMUDGED   (0x0400) /* index v5 only flag */
>>  #define CE_STAGESHIFT 12
>>
>> +#define CONFLICT_CONFLICTED (0x8000)
>> +#define CONFLICT_STAGESHIFT 13
>> +#define CONFLICT_STAGEMASK (0x6000)
>> +
>>  /*
>>   * Range 0xFFFF0000 in ce_flags is divided into
>>   * two parts: in-memory flags and on-disk ones.
>
>> diff --git a/read-cache-v5.c b/read-cache-v5.c
>> new file mode 100644
>> index 0000000..9d8c8f0
>> --- /dev/null
>> +++ b/read-cache-v5.c
>> +static int read_index_v5(struct index_state *istate, void *mmap,
>> +                        unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, foffsetblock, nr = 0, *extoffsets;
>> +       unsigned int dir_offset, dir_table_offset;
>> +       int need_root = 0, i;
>> +       uint32_t *offset;
>> +       struct directory_entry *root_directory, *de, *last_de;
>> +       const char **paths = NULL;
>> +       struct pathspec adjusted_pathspec;
>> +       struct cache_header *hdr;
>> +       struct cache_header_v5 *hdr_v5;
>> +
>> +       hdr = mmap;
>> +       hdr_v5 = ptr_add(mmap, sizeof(*hdr));
>> +       istate->cache_alloc = alloc_nr(ntohl(hdr->hdr_entries));
>> +       istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
>> +       extoffsets = xcalloc(ntohl(hdr_v5->hdr_nextension), sizeof(int));
>> +       for (i = 0; i < ntohl(hdr_v5->hdr_nextension); i++) {
>> +               offset = ptr_add(mmap, sizeof(*hdr) + sizeof(*hdr_v5));
>> +               extoffsets[i] = htonl(*offset);
>> +       }
>> +
>> +       /* Skip size of the header + crc sum + size of offsets to extensions + size of offsets */
>> +       dir_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4
>> +               + (ntohl(hdr_v5->hdr_ndir) + 1) * 4;
>> +       dir_table_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4;
>> +       root_directory = read_directories(&dir_offset, &dir_table_offset,
>> +                                         mmap, mmap_size);
>> +
>> +       entry_offset = ntohl(hdr_v5->hdr_fblockoffset);
>> +       foffsetblock = dir_offset;
>> +
>> +       if (opts && opts->pathspec && opts->pathspec->nr) {
>> +               paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
>> +               paths[opts->pathspec->nr] = NULL;
>
> Put this statement here
>
> GUARD_PATHSPEC(opts->pathspec,
>       PATHSPEC_FROMTOP |
>       PATHSPEC_MAXDEPTH |
>       PATHSPEC_LITERAL |
>       PATHSPEC_GLOB |
>       PATHSPEC_ICASE);
>
> This says the mentioned magic is safe in this code. New magic may or
> may not be and needs to be checked (soonest by me, I'm going to add
> negative pathspec and I'll need to look into how it should be handled
> in this code block).

Thanks, I'll add the statement in the re-roll.

>> +               for (i = 0; i < opts->pathspec->nr; i++) {
>> +                       char *super = strdup(opts->pathspec->items[i].match);
>> +                       int len = strlen(super);
>
> You should only check as far as items[i].nowildcard_len, not strlen().
> The rest could be wildcards and stuff and not so reliable.

Ok, will change it in the re-roll.

>> +                       while (len && super[len - 1] == '/' && super[len - 2] == '/')
>> +                               super[--len] = '\0'; /* strip all but one trailing slash */
>> +                       while (len && super[--len] != '/')
>> +                               ; /* scan backwards to next / */
>> +                       if (len >= 0)
>> +                               super[len--] = '\0';
>> +                       if (len <= 0) {
>> +                               need_root = 1;
>> +                               break;
>> +                       }
>> +                       paths[i] = super;
>> +               }
>
> And maybe put the comment "FIXME: consider merging this code with
> create_simplify() in dir.c" somewhere. It's for me to look for things
> to do when I'm bored ;-)

Heh, thanks, will do.

>> +       }
>> +
>> +       if (!need_root)
>> +               parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
>
> I would go with PATHSPEC_PREFER_FULL instead of _CWD as it's safer.
> Looking only at this function without caller context, it's hard to say
> if _CWD is the right choice.

Ok, thanks, will change.

>> +
>> +       de = root_directory;
>> +       last_de = de;
>
> This statement is redundant. last_de is only used in one code block
> below and it's always re-initialized before entering the loop to skip
> subdirs.

Right, good catch!  Will remove it.

>> +       while (de) {
>> +               if (need_root ||
>> +                   match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
>> +                       if (read_entries(istate, de, entry_offset,
>> +                                        mmap, mmap_size, &nr,
>> +                                        foffsetblock) < 0)
>> +                               return -1;
>> +               } else {
>> +                       last_de = de;
>> +                       for (i = 0; i < de->de_nsubtrees; i++) {
>> +                               de->sub[i]->next = last_de->next;
>> +                               last_de->next = de->sub[i];
>> +                               last_de = last_de->next;
>> +                       }
>> +               }
>> +               de = de->next;
>> +       }
>> +       free_directory_tree(root_directory);
>> +       istate->cache_nr = nr;
>> +       return 0;
>> +}
> -- 
> Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 23/24] POC for partial writing
  2013-11-30  9:58   ` Duy Nguyen
@ 2013-11-30 10:50     ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-30 10:50 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano, tr, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Nov 27, 2013 at 7:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> This makes update-index use both partial reading and partial writing.
>> Partial reading is only used no option other than the paths is passed to
>> the command.
>>
>> This passes the test suite,
>
> Just checking, the test suite was run with TEST_GIT_INDEX_VERSION=5, right?

Yes, sure, I've been using that for finding problems in my approach.

>> but doesn't behave correctly when a write
>> fails.  A log should be written to the lock file, in order to be able to
>> recover if a write fails.
>
> From the API point of view this looks nice (you should have hidden
> needs_write = 1 in cache_invalidate_path and change_cache_version
> though).

Ok, thanks, I'll look into that, once I have time to make more than a
POC for this.

> We could support partial file removal too by marking removed
> files "removed", but that impacts the reading code and may have bad
> interaction with cache_invalidate_path/needs_rewrite. Probably not
> worth the effort until someone shows us they remove stuff often.

Yes, right, I think it could be done, but I didn't take the time to look
into that for now.  The reading code should actually be fine as it is,
as it already takes care of entries with the removed flag, but I'm not
sure about cache_tree_invalidate_path/needs_rewrite.

>> ---
>>  builtin/update-index.c |  43 +++++++++++---
>>  cache-tree.c           |  13 +++++
>>  cache-tree.h           |   1 +
>>  cache.h                |  27 ++++++++-
>>  lockfile.c             |   2 +-
>>  read-cache-v2.c        |   2 +
>>  read-cache-v5.c        | 154 ++++++++++++++++++++++++++++++++++++++++---------
>>  read-cache.c           |  30 ++++++++++
>>  read-cache.h           |   1 +
>>  resolve-undo.c         |   1 +
>>  10 files changed, 237 insertions(+), 37 deletions(-)
>>
>> diff --git a/builtin/update-index.c b/builtin/update-index.c
>> index 8b3f7a0..69f0949 100644
>> --- a/builtin/update-index.c
>> +++ b/builtin/update-index.c
>> @@ -56,6 +56,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
>>                 else
>>                         active_cache[pos]->ce_flags &= ~flag;
>>                 cache_tree_invalidate_path(active_cache_tree, path);
>> +               the_index.needs_rewrite = 1;
>>                 active_cache_changed = 1;
>>                 return 0;
>>         }
>> @@ -99,6 +100,8 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len
>>         memcpy(ce->name, path, len);
>>         ce->ce_flags = create_ce_flags(0);
>>         ce->ce_namelen = len;
>> +       if (old)
>> +               ce->entry_pos = old->entry_pos;
>>         fill_stat_cache_info(ce, st);
>>         ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
>>
>> @@ -268,6 +271,7 @@ static void chmod_path(int flip, const char *path)
>>                 goto fail;
>>         }
>>         cache_tree_invalidate_path(active_cache_tree, path);
>> +       the_index.needs_rewrite = 1;
>>         active_cache_changed = 1;
>>         report("chmod %cx '%s'", flip, path);
>>         return;
>> @@ -706,15 +710,18 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx,
>>
>>  int cmd_update_index(int argc, const char **argv, const char *prefix)
>>  {
>> -       int newfd, entries, has_errors = 0, line_termination = '\n';
>> +       int newfd, has_errors = 0, line_termination = '\n';
>>         int read_from_stdin = 0;
>>         int prefix_length = prefix ? strlen(prefix) : 0;
>>         int preferred_index_format = 0;
>>         char set_executable_bit = 0;
>>         struct refresh_params refresh_args = {0, &has_errors};
>>         int lock_error = 0;
>> +       struct filter_opts opts;
>> +       struct pathspec pathspec;
>>         struct lock_file *lock_file;
>>         struct parse_opt_ctx_t ctx;
>> +       int i, needs_full_read = 0;
>>         int parseopt_state = PARSE_OPT_UNKNOWN;
>>         struct option options[] = {
>>                 OPT_BIT('q', NULL, &refresh_args.flags,
>> @@ -810,9 +817,23 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>>         if (newfd < 0)
>>                 lock_error = errno;
>>
>> -       entries = read_cache();
>> -       if (entries < 0)
>> -               die("cache corrupted");
>> +       for (i = 0; i < argc; i++) {
>> +               if (!prefixcmp(argv[i], "--"))
>> +                       needs_full_read = 1;
>> +       }
>> +       if (!needs_full_read) {
>> +               memset(&opts, 0, sizeof(struct filter_opts));
>> +               parse_pathspec(&pathspec, 0,
>> +                              PATHSPEC_PREFER_CWD |
>> +                              PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>> +                              prefix, argv + 1);
>> +               opts.pathspec = &pathspec;
>> +               if (read_cache_filtered(&opts) < 0)
>> +                       die("cache corrupted");
>> +       } else {
>> +               if (read_cache() < 0)
>> +                       die("cache corrupted");
>> +       }
>>
>>         /*
>>          * Custom copy of parse_options() because we want to handle
>> @@ -862,6 +883,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>>                             preferred_index_format,
>>                             INDEX_FORMAT_LB, INDEX_FORMAT_UB);
>>
>> +               the_index.needs_rewrite = 1;
>>                 active_cache_changed = 1;
>>                 change_cache_version(preferred_index_format);
>>         }
>> @@ -890,17 +912,22 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>>         }
>>
>>         if (active_cache_changed) {
>> +               int r;
>>                 if (newfd < 0) {
>>                         if (refresh_args.flags & REFRESH_QUIET)
>>                                 exit(128);
>>                         unable_to_lock_index_die(get_index_file(), lock_error);
>>                 }
>> -               if (write_cache(newfd, active_cache, active_nr) ||
>> -                   commit_locked_index(lock_file))
>> +               r = write_cache_partial(newfd);
>> +               if (r < 0)
>>                         die("Unable to write new index file");
>> +               else if (r == 0)
>> +                       commit_lock_file(lock_file);
>> +               else
>> +                       remove_lock_file();
>> +       } else {
>> +               rollback_lock_file(lock_file);
>>         }
>>
>> -       rollback_lock_file(lock_file);
>> -
>>         return has_errors ? 1 : 0;
>>  }
>> diff --git a/cache-tree.c b/cache-tree.c
>> index 1209732..a3d18bb 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -123,6 +123,15 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
>>                 return;
>>         slash = strchr(path, '/');
>>         it->entry_count = -1;
>> +       /*
>> +        * Mark the cache_tree directory entry as invalid too. The
>> +        * entry_count defines if the tree is valid, so we don't need
>> +        * to reset any other field.
>> +        */
>> +       if (it->de_ref) {
>> +               it->de_ref->de_nentries = -1;
>> +               it->de_ref->changed = 1;
>> +       }
>>         if (!slash) {
>>                 int pos;
>>                 namelen = strlen(path);
>> @@ -140,6 +149,10 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
>>                                 sizeof(struct cache_tree_sub *) *
>>                                 (it->subtree_nr - pos - 1));
>>                         it->subtree_nr--;
>> +                       if (it->de_ref) {
>> +                               it->de_ref->de_nsubtrees--;
>> +                               it->de_ref->changed = 1;
>> +                       }
>>                 }
>>                 return;
>>         }
>> diff --git a/cache-tree.h b/cache-tree.h
>> index 9818926..eaf14a9 100644
>> --- a/cache-tree.h
>> +++ b/cache-tree.h
>> @@ -18,6 +18,7 @@ struct cache_tree {
>>         unsigned char sha1[20];
>>         int subtree_nr;
>>         int subtree_alloc;
>> +       struct directory_entry *de_ref;
>>         struct cache_tree_sub **down;
>>  };
>>
>> diff --git a/cache.h b/cache.h
>> index 71b98cf..1a634dc 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -137,11 +137,31 @@ struct cache_entry {
>>         unsigned int ce_namelen;
>>         unsigned char sha1[20];
>>         uint32_t ce_stat_crc;
>> +       unsigned int entry_pos;
>> +       unsigned int changed;
>>         struct cache_entry *next; /* used by name_hash */
>>         struct cache_entry *next_ce;
>>         char name[FLEX_ARRAY]; /* more */
>>  };
>>
>> +struct directory_entry {
>> +       struct directory_entry **sub;
>> +       struct directory_entry *next;
>> +       struct directory_entry *next_hash;
>> +       struct cache_entry *ce;
>> +       struct cache_entry *ce_last;
>> +       uint32_t de_foffset;
>> +       uint32_t de_nsubtrees;
>> +       uint32_t de_nfiles;
>> +       uint32_t de_nentries;
>> +       unsigned char sha1[20];
>> +       uint16_t de_flags;
>> +       uint32_t de_pathlen;
>> +       uint32_t entry_pos;
>> +       unsigned int changed;
>> +       char pathname[FLEX_ARRAY];
>> +};
>> +
>>  #define CE_NAMEMASK  (0x0fff)
>>  #define CE_STAGEMASK (0x3000)
>>  #define CE_EXTENDED  (0x4000)
>> @@ -317,13 +337,15 @@ struct filter_opts {
>>
>>  struct index_state {
>>         struct cache_entry **cache;
>> +       struct directory_entry *root_directory;
>>         unsigned int version;
>>         unsigned int cache_nr, cache_alloc, cache_changed;
>>         struct string_list *resolve_undo;
>>         struct cache_tree *cache_tree;
>>         struct cache_time timestamp;
>>         unsigned name_hash_initialized : 1,
>> -                initialized : 1;
>> +                initialized : 1,
>> +                needs_rewrite : 1;
>>         struct hash_table name_hash;
>>         struct hash_table dir_hash;
>>         struct index_ops *ops;
>> @@ -353,6 +375,7 @@ extern void free_name_hash(struct index_state *istate);
>>  #define is_cache_unborn() is_index_unborn(&the_index)
>>  #define read_cache_unmerged() read_index_unmerged(&the_index)
>>  #define write_cache(newfd, cache, entries) write_index(&the_index, (newfd))
>> +#define write_cache_partial(newfd) write_index_partial(&the_index, (newfd))
>>  #define discard_cache() discard_index(&the_index)
>>  #define unmerged_cache() unmerged_index(&the_index)
>>  #define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
>> @@ -529,6 +552,7 @@ extern int read_index_from(struct index_state *, const char *path);
>>  extern int is_index_unborn(struct index_state *);
>>  extern int read_index_unmerged(struct index_state *);
>>  extern int write_index(struct index_state *, int newfd);
>> +extern int write_index_partial(struct index_state *, int newfd);
>>  extern int discard_index(struct index_state *);
>>  extern int unmerged_index(const struct index_state *);
>>  extern int verify_path(const char *path);
>> @@ -613,6 +637,7 @@ extern NORETURN void unable_to_lock_index_die(const char *path, int err);
>>  extern int hold_lock_file_for_update(struct lock_file *, const char *path, int);
>>  extern int hold_lock_file_for_append(struct lock_file *, const char *path, int);
>>  extern int commit_lock_file(struct lock_file *);
>> +extern void remove_lock_file(void);
>>  extern void update_index_if_able(struct index_state *, struct lock_file *);
>>
>>  extern int hold_locked_index(struct lock_file *, int);
>> diff --git a/lockfile.c b/lockfile.c
>> index 8fbcb6a..c150e5c 100644
>> --- a/lockfile.c
>> +++ b/lockfile.c
>> @@ -7,7 +7,7 @@
>>  static struct lock_file *lock_file_list;
>>  static const char *alternate_index_output;
>>
>> -static void remove_lock_file(void)
>> +void remove_lock_file(void)
>>  {
>>         pid_t me = getpid();
>>
>> diff --git a/read-cache-v2.c b/read-cache-v2.c
>> index f884c10..1fec892 100644
>> --- a/read-cache-v2.c
>> +++ b/read-cache-v2.c
>> @@ -555,5 +555,7 @@ struct index_ops v2_ops = {
>>         match_stat_basic,
>>         verify_hdr,
>>         read_index_v2,
>> +       write_index_v2,
>> +       /* Partial writing is the same as writing the full index for v2 */
>>         write_index_v2
>>  };
>> diff --git a/read-cache-v5.c b/read-cache-v5.c
>> index a5e9b5a..13436a3 100644
>> --- a/read-cache-v5.c
>> +++ b/read-cache-v5.c
>> @@ -20,22 +20,6 @@ struct extension_header {
>>         uint32_t crc;
>>  };
>>
>> -struct directory_entry {
>> -       struct directory_entry **sub;
>> -       struct directory_entry *next;
>> -       struct directory_entry *next_hash;
>> -       struct cache_entry *ce;
>> -       struct cache_entry *ce_last;
>> -       uint32_t de_foffset;
>> -       uint32_t de_nsubtrees;
>> -       uint32_t de_nfiles;
>> -       uint32_t de_nentries;
>> -       unsigned char sha1[20];
>> -       uint16_t de_flags;
>> -       uint32_t de_pathlen;
>> -       char pathname[FLEX_ARRAY];
>> -};
>> -
>>  struct conflict_part {
>>         struct conflict_part *next;
>>         uint16_t flags;
>> @@ -246,7 +230,7 @@ static struct directory_entry *read_directories(unsigned int *dir_offset,
>>                 offsetof(struct ondisk_directory_entry, name) - 5;
>>         disk_de = ptr_add(mmap, *dir_offset);
>>         de = directory_entry_from_ondisk(disk_de, len);
>> -
>> +       de->entry_pos = *dir_offset;
>>         data_len = len + 1 + offsetof(struct ondisk_directory_entry, name);
>>         filecrc = ptr_add(mmap, *dir_offset + data_len);
>>         if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
>> @@ -281,6 +265,7 @@ static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
>>         entry_offset = first_entry_offset + ntoh_l(*beginning);
>>         disk_ce = ptr_add(mmap, entry_offset);
>>         *ce = cache_entry_from_ondisk(disk_ce, pathname, len, pathlen);
>> +       (*ce)->entry_pos = entry_offset;
>>         filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
>>         if (!check_crc32(0,
>>                 ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
>> @@ -439,6 +424,7 @@ static struct cache_tree *convert_one(struct directory_entry *de)
>>
>>         it = cache_tree();
>>         it->entry_count = de->de_nentries;
>> +       it->de_ref = de;
>>         if (0 <= it->entry_count)
>>                 hashcpy(it->sha1, de->sha1);
>>
>> @@ -523,14 +509,6 @@ static int read_entries(struct index_state *istate, struct directory_entry *de,
>>         return 0;
>>  }
>>
>> -static void free_directory_tree(struct directory_entry *de) {
>> -       int i;
>> -
>> -       for (i = 0; i < de->de_pathlen; i++)
>> -               free_directory_tree(de->sub[i]);
>> -       free(de);
>> -}
>> -
>>  /*
>>   * Read an index-v5 file filtered by the filter_opts.   If opts is NULL,
>>   * everything will be read.
>> @@ -626,7 +604,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
>>                 }
>>         }
>>         istate->cache_tree = cache_tree_convert_v5(root_directory);
>> -       free_directory_tree(root_directory);
>> +       istate->root_directory = root_directory;
>>         istate->cache_nr = nr;
>>         return 0;
>>  }
>> @@ -696,6 +674,7 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
>>          * that hasn't changed checking the sha1.
>>          */
>>         ce->ce_flags |= CE_SMUDGED;
>> +       ce->changed = 1;
>>  }
>>
>>  static char *super_directory(char *filename)
>> @@ -1231,6 +1210,103 @@ static int write_resolve_undo(struct index_state *istate,
>>         return 0;
>>  }
>>
>> +static int write_ce_if_necessary(struct cache_entry *ce, void *cb_data)
>> +{
>> +       int *fdx = cb_data, pathlen, size;
>> +       int fd = *fdx;
>> +       char *dir;
>> +       struct ondisk_cache_entry *ondisk;
>> +       uint32_t crc;
>> +
>> +       assert(ce->entry_pos != 0);
>> +       /* TODO I'm just using the_index out of lazyness here */
>> +       if (!ce_uptodate(ce) && is_racy_timestamp(&the_index, ce))
>> +               ce_smudge_racily_clean_entry(ce);
>> +       if (!ce->changed)
>> +               return 0;
>> +       if (is_null_sha1(ce->sha1)) {
>> +               static const char msg[] = "cache entry has null sha1: %s";
>> +               static int allow = -1;
>> +
>> +               if (allow < 0)
>> +                       allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);
>> +               if (allow)
>> +                       warning(msg, ce->name);
>> +               else
>> +                       return error(msg, ce->name);
>> +       }
>> +       dir = super_directory(ce->name);
>> +       pathlen = dir ? strlen(dir) + 1 : 0;
>> +       size = offsetof(struct ondisk_cache_entry, name) +
>> +               ce_namelen(ce) - pathlen + 1;
>> +       ondisk = xmalloc(size);
>> +
>> +       crc = 0;
>> +       ondisk_from_cache_entry(ce, ondisk, pathlen);
>> +       if (lseek(fd, ce->entry_pos, SEEK_SET) < ce->entry_pos)
>> +               die("eror ce seeking");
>> +       if (ce_write(&crc, fd, ondisk, size) < 0)
>> +               return -1;
>> +       crc = htonl(crc);
>> +       if (ce_write(NULL, fd, &crc, 4) < 0)
>> +               return -1;
>> +       return ce_flush(fd);
>> +}
>> +
>> +static void ondisk_from_directory_entry_partial(struct directory_entry *de,
>> +                                               struct ondisk_directory_entry *ondisk)
>> +{
>> +       ondisk->foffset   = htonl(de->de_foffset);
>> +       ondisk->nsubtrees = htonl(de->de_nsubtrees);
>> +       ondisk->nfiles    = htonl(de->de_nfiles);
>> +       ondisk->nentries  = htonl(de->de_nentries);
>> +       hashcpy(ondisk->sha1, de->sha1);
>> +       ondisk->flags     = htons(de->de_flags);
>> +       if (de->de_pathlen == 0) {
>> +               memcpy(ondisk->name, "\0", 1);
>> +       } else {
>> +               memcpy(ondisk->name, de->pathname, de->de_pathlen);
>> +               memcpy(ondisk->name + de->de_pathlen - 1, "/\0", 2);
>> +       }
>> +}
>> +
>> +static int write_directories_partial(struct directory_entry *de, int fd)
>> +{
>> +       int ondisk_size = offsetof(struct ondisk_directory_entry, name);
>> +       int size = ondisk_size + de->de_pathlen + 1;
>> +       int i;
>> +       uint32_t crc;
>> +       struct ondisk_directory_entry *ondisk;
>> +
>> +       if (de->changed) {
>> +               crc = 0;
>> +               ondisk = xmalloc(size);
>> +               ondisk_from_directory_entry_partial(de, ondisk);
>> +               if (lseek(fd, de->entry_pos, SEEK_SET) < de->entry_pos)
>> +                       die("error directory seeking");;
>> +               if (ce_write(&crc, fd, ondisk, size) < 0)
>> +                       return -1;
>> +               crc = htonl(crc);
>> +               if (ce_write(NULL, fd, &crc, 4) < 0)
>> +                       return -1;
>> +               free(ondisk);
>> +               if (ce_flush(fd) < 0)
>> +                       return -1;
>> +       }
>> +       for (i = 0; i < de->de_nsubtrees; i++) {
>> +               if (write_directories_partial(de->sub[i], fd) < 0)
>> +                       return -1;
>> +       }
>> +       return 0;
>> +}
>> +
>> +static int write_partial(struct index_state *istate, int fd)
>> +{
>> +       write_directories_partial(istate->root_directory, fd);
>> +
>> +       return for_each_index_entry(istate, write_ce_if_necessary, &fd);
>> +}
>> +
>>  static int write_index_v5(struct index_state *istate, int newfd)
>>  {
>>         struct cache_header hdr;
>> @@ -1296,9 +1372,33 @@ static int write_index_v5(struct index_state *istate, int newfd)
>>         return ce_flush(newfd);
>>  }
>>
>> +static int write_index_partial_v5(struct index_state *istate, int newfd)
>> +{
>> +       int fd;
>> +       char *path = get_index_file();
>> +
>> +       if (istate->needs_rewrite || istate->cache_nr == 0)
>> +               return write_index_v5(istate, newfd);
>> +       if (istate->filter_opts && istate->needs_rewrite)
>> +               die("BUG: cannot write a partially read index");
>> +       fd = open(path, O_RDWR, 0666);
>> +       if (fd < 0) {
>> +               if (errno == ENOENT)
>> +                       die("no index file exists cannot do a partial write");
>> +               die_errno("index file opening for writing failed");
>> +       }
>> +
>> +       if (write_partial(istate, fd) < 0)
>> +               return -1;
>> +       if (ce_flush(fd) < 0)
>> +               return -1;
>> +       return 1;
>> +}
>> +
>>  struct index_ops v5_ops = {
>>         match_stat_basic,
>>         verify_hdr,
>>         read_index_v5,
>> -       write_index_v5
>> +       write_index_v5,
>> +       write_index_partial_v5
>>  };
>> diff --git a/read-cache.c b/read-cache.c
>> index 04430e5..1cad0e2 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -32,6 +32,9 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
>>
>>         remove_name_hash(istate, old);
>>         set_index_entry(istate, nr, ce);
>> +       ce->changed = 1;
>> +       if (ce->entry_pos == 0)
>> +               istate->needs_rewrite = 1;
>>         istate->cache_changed = 1;
>>  }
>>
>> @@ -494,6 +497,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
>>
>>         record_resolve_undo(istate, ce);
>>         remove_name_hash(istate, ce);
>> +       istate->needs_rewrite = 1;
>>         istate->cache_changed = 1;
>>         istate->cache_nr--;
>>         if (pos >= istate->cache_nr)
>> @@ -520,6 +524,7 @@ void remove_marked_cache_entries(struct index_state *istate)
>>                 else
>>                         ce_array[j++] = ce_array[i];
>>         }
>> +       istate->needs_rewrite = 1;
>>         istate->cache_changed = 1;
>>         istate->cache_nr = j;
>>  }
>> @@ -1024,6 +1029,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
>>                         istate->cache + pos,
>>                         (istate->cache_nr - pos - 1) * sizeof(ce));
>>         set_index_entry(istate, pos, ce);
>> +       istate->needs_rewrite = 1;
>>         istate->cache_changed = 1;
>>         return 0;
>>  }
>> @@ -1108,6 +1114,8 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
>>         size = ce_size(ce);
>>         updated = xmalloc(size);
>>         memcpy(updated, ce, size);
>> +       updated->changed = 1;
>> +       updated->entry_pos = ce->entry_pos;
>>         fill_stat_cache_info(updated, &st);
>>         /*
>>          * If ignore_valid is not set, we should leave CE_VALID bit
>> @@ -1201,6 +1209,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>                                  * means the index is not valid anymore.
>>                                  */
>>                                 ce->ce_flags &= ~CE_VALID;
>> +                               /* TODO: remove this maybe? */
>> +                               istate->needs_rewrite = 1;
>>                                 istate->cache_changed = 1;
>>                         }
>>                         if (quiet)
>> @@ -1241,6 +1251,8 @@ void initialize_index(struct index_state *istate, int version)
>>                         version = atoi(envversion);
>>         }
>>         istate->version = version;
>> +       istate->needs_rewrite = 0;
>> +       istate->root_directory = NULL;
>>         set_istate_ops(istate);
>>  }
>>
>> @@ -1427,6 +1439,16 @@ int is_index_unborn(struct index_state *istate)
>>         return (!istate->cache_nr && !istate->timestamp.sec);
>>  }
>>
>> +static void free_directory_tree(struct directory_entry *de) {
>> +       int i;
>> +
>> +       if (!de)
>> +               return;
>> +       for (i = 0; i < de->de_pathlen; i++)
>> +               free_directory_tree(de->sub[i]);
>> +       free(de);
>> +}
>> +
>>  int discard_index(struct index_state *istate)
>>  {
>>         int i;
>> @@ -1435,6 +1457,7 @@ int discard_index(struct index_state *istate)
>>                 free(istate->cache[i]);
>>         resolve_undo_clear_index(istate);
>>         istate->cache_nr = 0;
>> +       istate->needs_rewrite = 0;
>>         istate->cache_changed = 0;
>>         istate->timestamp.sec = 0;
>>         istate->timestamp.nsec = 0;
>> @@ -1446,6 +1469,8 @@ int discard_index(struct index_state *istate)
>>         istate->cache_alloc = 0;
>>         istate->ops = NULL;
>>         istate->filter_opts = NULL;
>> +       free_directory_tree(istate->root_directory);
>> +       istate->root_directory = NULL;
>>         return 0;
>>  }
>>
>> @@ -1491,6 +1516,11 @@ int write_index(struct index_state *istate, int newfd)
>>         return istate->ops->write_index(istate, newfd);
>>  }
>>
>> +int write_index_partial(struct index_state *istate, int newfd)
>> +{
>> +       return istate->ops->write_index_partial(istate, newfd);
>> +}
>> +
>>  /*
>>   * Read the index file that is potentially unmerged into given
>>   * index_state, dropping any unmerged entries.  Returns true if
>> diff --git a/read-cache.h b/read-cache.h
>> index 9d66df6..e7f36ae 100644
>> --- a/read-cache.h
>> +++ b/read-cache.h
>> @@ -31,6 +31,7 @@ struct index_ops {
>>         int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size,
>>                           struct filter_opts *opts);
>>         int (*write_index)(struct index_state *istate, int newfd);
>> +       int (*write_index_partial)(struct index_state *istate, int newfd);
>>  };
>>
>>  extern struct index_ops v2_ops;
>> diff --git a/resolve-undo.c b/resolve-undo.c
>> index c09b006..c496c20 100644
>> --- a/resolve-undo.c
>> +++ b/resolve-undo.c
>> @@ -110,6 +110,7 @@ void resolve_undo_clear_index(struct index_state *istate)
>>         string_list_clear(resolve_undo, 1);
>>         free(resolve_undo);
>>         istate->resolve_undo = NULL;
>> +       istate->needs_rewrite = 1;
>>         istate->cache_changed = 1;
>>  }
>>
>> --
>> 1.8.4.2
>>
>
>
>
> -- 
> Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 12/24] read-cache: read index-v5
  2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
  2013-11-30  9:17   ` Duy Nguyen
@ 2013-11-30 12:19   ` Antoine Pelisse
  2013-11-30 20:10     ` Thomas Gummerer
  2013-11-30 15:26   ` Antoine Pelisse
  2 siblings, 1 reply; 41+ messages in thread
From: Antoine Pelisse @ 2013-11-30 12:19 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, robin.rosenberg,
	Eric Sunshine, ramsay

On Wed, Nov 27, 2013 at 1:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Make git read the index file version 5 without complaining.
>
> This version of the reader reads neither the cache-tree
> nor the resolve undo data, however, it won't choke on an
> index that includes such data.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
> Helped-by: Thomas Rast <trast@student.ethz.ch>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> [...]
> +static struct directory_entry *read_directories(unsigned int *dir_offset,
> +                                               unsigned int *dir_table_offset,
> +                                               void *mmap, int mmap_size)

Minor nit: why is this mmap_size "int" while all others are "unsigned long" ?

> [...]
> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
> +                     void *mmap, unsigned long mmap_size,
> +                     unsigned int first_entry_offset,
> +                     unsigned int foffsetblock)
> [...]
> +static int read_entries(struct index_state *istate, struct directory_entry *de,
> +                       unsigned int first_entry_offset, void *mmap,
> +                       unsigned long mmap_size, unsigned int *nr,
> +                       unsigned int foffsetblock)
> [...]
> +static int read_index_v5(struct index_state *istate, void *mmap,
> +                        unsigned long mmap_size, struct filter_opts *opts)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 12/24] read-cache: read index-v5
  2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
  2013-11-30  9:17   ` Duy Nguyen
  2013-11-30 12:19   ` Antoine Pelisse
@ 2013-11-30 15:26   ` Antoine Pelisse
  2013-11-30 20:27     ` Thomas Gummerer
  2 siblings, 1 reply; 41+ messages in thread
From: Antoine Pelisse @ 2013-11-30 15:26 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, robin.rosenberg,
	Eric Sunshine, ramsay

On Wed, Nov 27, 2013 at 1:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> +static int verify_hdr(void *mmap, unsigned long size)
> +{
> +       uint32_t *filecrc;
> +       unsigned int header_size;
> +       struct cache_header *hdr;
> +       struct cache_header_v5 *hdr_v5;
> +
> +       if (size < sizeof(struct cache_header)
> +           + sizeof (struct cache_header_v5) + 4)
> +               die("index file smaller than expected");
> +
> +       hdr = mmap;
> +       hdr_v5 = ptr_add(mmap, sizeof(*hdr));
> +       /* Size of the header + the size of the extensionoffsets */
> +       header_size = sizeof(*hdr) + sizeof(*hdr_v5) + hdr_v5->hdr_nextension * 4;
> +       /* Initialize crc */
> +       filecrc = ptr_add(mmap, header_size);
> +       if (!check_crc32(0, hdr, header_size, ntohl(*filecrc)))
> +               return error("bad index file header crc signature");
> +       return 0;
> +}

I find it curious that we actually need a value from the header (and
use it for pointer arithmetic) to check that the header is valid. The
application will crash before the crc is checked if
hdr_v5->hdr_nextensions is corrupted. Or am I missing something ?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 09/24] ls-files.c: use index api
  2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
  2013-11-30  9:17   ` Duy Nguyen
@ 2013-11-30 15:39   ` Antoine Pelisse
  2013-11-30 20:08     ` Thomas Gummerer
  1 sibling, 1 reply; 41+ messages in thread
From: Antoine Pelisse @ 2013-11-30 15:39 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Eric Sunshine, ramsay

On Wed, Nov 27, 2013 at 1:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> @@ -447,6 +463,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>         struct dir_struct dir;
>         struct exclude_list *el;
>         struct string_list exclude_list = STRING_LIST_INIT_NODUP;
> +       struct filter_opts *opts = xmalloc(sizeof(*opts));
>         struct option builtin_ls_files_options[] = {
>                 { OPTION_CALLBACK, 'z', NULL, NULL, NULL,
>                         N_("paths are separated with NUL character"),
> @@ -512,9 +529,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                 prefix_len = strlen(prefix);
>         git_config(git_default_config, NULL);
>
> -       if (read_cache() < 0)
> -               die("index file corrupt");
> -
>         argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
>                         ls_files_usage, 0);
>         el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
> @@ -550,6 +564,24 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                        PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>                        prefix, argv);
>
> +       if (!with_tree && !needs_trailing_slash_stripped()) {
> +               memset(opts, 0, sizeof(*opts));
> +               opts->pathspec = &pathspec;
> +               opts->read_staged = 1;
> +               if (show_resolve_undo)
> +                       opts->read_resolve_undo = 1;
> +               if (read_cache_filtered(opts) < 0)
> +                       die("index file corrupt");
> +       } else {
> +               if (read_cache() < 0)
> +                       die("index file corrupt");
> +               parse_pathspec(&pathspec, 0,
> +                              PATHSPEC_PREFER_CWD |
> +                              PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
> +                              prefix, argv);
> +
> +       }
> +

Would it make sense to move the declaration of "opts" as a non-pointer
to the block where it's used ?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 09/24] ls-files.c: use index api
  2013-11-30 15:39   ` Antoine Pelisse
@ 2013-11-30 20:08     ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-30 20:08 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: git, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Eric Sunshine, ramsay

Antoine Pelisse <apelisse@gmail.com> writes:

> On Wed, Nov 27, 2013 at 1:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> @@ -447,6 +463,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>>         struct dir_struct dir;
>>         struct exclude_list *el;
>>         struct string_list exclude_list = STRING_LIST_INIT_NODUP;
>> +       struct filter_opts *opts = xmalloc(sizeof(*opts));
>>         struct option builtin_ls_files_options[] = {
>>                 { OPTION_CALLBACK, 'z', NULL, NULL, NULL,
>>                         N_("paths are separated with NUL character"),
>> @@ -512,9 +529,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>>                 prefix_len = strlen(prefix);
>>         git_config(git_default_config, NULL);
>>
>> -       if (read_cache() < 0)
>> -               die("index file corrupt");
>> -
>>         argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
>>                         ls_files_usage, 0);
>>         el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
>> @@ -550,6 +564,24 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>>                        PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>>                        prefix, argv);
>>
>> +       if (!with_tree && !needs_trailing_slash_stripped()) {
>> +               memset(opts, 0, sizeof(*opts));
>> +               opts->pathspec = &pathspec;
>> +               opts->read_staged = 1;
>> +               if (show_resolve_undo)
>> +                       opts->read_resolve_undo = 1;
>> +               if (read_cache_filtered(opts) < 0)
>> +                       die("index file corrupt");
>> +       } else {
>> +               if (read_cache() < 0)
>> +                       die("index file corrupt");
>> +               parse_pathspec(&pathspec, 0,
>> +                              PATHSPEC_PREFER_CWD |
>> +                              PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>> +                              prefix, argv);
>> +
>> +       }
>> +
>
> Would it make sense to move the declaration of "opts" as a non-pointer
> to the block where it's used ?

Yes, I think that would make sense, will do so in the re-roll. Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 12/24] read-cache: read index-v5
  2013-11-30 12:19   ` Antoine Pelisse
@ 2013-11-30 20:10     ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-30 20:10 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: git, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, robin.rosenberg,
	Eric Sunshine, ramsay

Antoine Pelisse <apelisse@gmail.com> writes:

> On Wed, Nov 27, 2013 at 1:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> Make git read the index file version 5 without complaining.
>>
>> This version of the reader reads neither the cache-tree
>> nor the resolve undo data, however, it won't choke on an
>> index that includes such data.
>>
>> Helped-by: Junio C Hamano <gitster@pobox.com>
>> Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
>> Helped-by: Thomas Rast <trast@student.ethz.ch>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>> [...]
>> +static struct directory_entry *read_directories(unsigned int *dir_offset,
>> +                                               unsigned int *dir_table_offset,
>> +                                               void *mmap, int mmap_size)
>
> Minor nit: why is this mmap_size "int" while all others are "unsigned long" ?

Thanks for catching that.  It should be "unsigned long" here to.  Will
fix in the re-roll.

>> [...]
>> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
>> +                     void *mmap, unsigned long mmap_size,
>> +                     unsigned int first_entry_offset,
>> +                     unsigned int foffsetblock)
>> [...]
>> +static int read_entries(struct index_state *istate, struct directory_entry *de,
>> +                       unsigned int first_entry_offset, void *mmap,
>> +                       unsigned long mmap_size, unsigned int *nr,
>> +                       unsigned int foffsetblock)
>> [...]
>> +static int read_index_v5(struct index_state *istate, void *mmap,
>> +                        unsigned long mmap_size, struct filter_opts *opts)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 12/24] read-cache: read index-v5
  2013-11-30 15:26   ` Antoine Pelisse
@ 2013-11-30 20:27     ` Thomas Gummerer
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-11-30 20:27 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: git, Junio C Hamano, Thomas Rast, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, robin.rosenberg,
	Eric Sunshine, ramsay

Antoine Pelisse <apelisse@gmail.com> writes:

> On Wed, Nov 27, 2013 at 1:00 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +static int verify_hdr(void *mmap, unsigned long size)
>> +{
>> +       uint32_t *filecrc;
>> +       unsigned int header_size;
>> +       struct cache_header *hdr;
>> +       struct cache_header_v5 *hdr_v5;
>> +
>> +       if (size < sizeof(struct cache_header)
>> +           + sizeof (struct cache_header_v5) + 4)
>> +               die("index file smaller than expected");
>> +
>> +       hdr = mmap;
>> +       hdr_v5 = ptr_add(mmap, sizeof(*hdr));
>> +       /* Size of the header + the size of the extensionoffsets */
>> +       header_size = sizeof(*hdr) + sizeof(*hdr_v5) + hdr_v5->hdr_nextension * 4;
>> +       /* Initialize crc */
>> +       filecrc = ptr_add(mmap, header_size);
>> +       if (!check_crc32(0, hdr, header_size, ntohl(*filecrc)))
>> +               return error("bad index file header crc signature");
>> +       return 0;
>> +}
>
> I find it curious that we actually need a value from the header (and
> use it for pointer arithmetic) to check that the header is valid. The
> application will crash before the crc is checked if
> hdr_v5->hdr_nextensions is corrupted. Or am I missing something ?

Good catch, I'm the one that was missing something here.  We still need
to use the value from the header before calculating the crc, but should
check if header_size - 4 is less than the total size of the index file.
Then even if the header is corrupted we won't read anything that is not
mmap'ed and thus won't crash.

This guard should also be included for everything else that checks the
crc checksum, as that has the same problems and the calculated place in
the file for the crc might be after the end of the file.

Thanks, will fix in the re-roll.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 00/24] Index-v5
  2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
                   ` (23 preceding siblings ...)
  2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
@ 2013-12-09 10:14 ` Thomas Gummerer
  24 siblings, 0 replies; 41+ messages in thread
From: Thomas Gummerer @ 2013-12-09 10:14 UTC (permalink / raw)
  To: git
  Cc: gitster, tr, mhagger, pclouds, robin.rosenberg, sunshine, ramsay,
	Antoine Pelisse

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Hi,
>
> previous rounds (without api) are at $gmane/202752, $gmane/202923,
> $gmane/203088 and $gmane/203517, the previous rounds with api were at
> $gmane/229732, $gmane/230210 and $gmane/232488.  Thanks to Duy for
> reviewing the the last round and Junio, Ramsay and Eric for additional
> comments.
>
> Since the last round I've added a POC for partial writing, resulting
> in the following performance improvements for update-index:
>
> Test                                        1063432           HEAD
> ------------------------------------------------------------------------------------
> 0003.2: v[23]: update-index                 0.60(0.38+0.20)   0.76(0.36+0.17) +26.7%
> 0003.3: v[23]: grep nonexistent -- subdir   0.28(0.17+0.11)   0.28(0.18+0.09) +0.0%
> 0003.4: v[23]: ls-files -- subdir           0.26(0.15+0.10)   0.24(0.14+0.09) -7.7%
> 0003.7: v[23] update-index                  0.59(0.36+0.22)   0.58(0.36+0.20) -1.7%
> 0003.9: v4: update-index                    0.46(0.28+0.17)   0.45(0.30+0.11) -2.2%
> 0003.10: v4: grep nonexistent -- subdir     0.26(0.14+0.11)   0.21(0.14+0.07) -19.2%
> 0003.11: v4: ls-files -- subdir             0.24(0.14+0.10)   0.20(0.12+0.08) -16.7%
> 0003.14: v4 update-index                    0.49(0.31+0.18)   0.65(0.34+0.17) +32.7%
> 0003.16: v5: update-index                   0.53(0.30+0.22)   0.50(0.28+0.20) -5.7%
> 0003.17: v5: ls-files                       0.27(0.15+0.12)   0.27(0.17+0.10) +0.0%
> 0003.18: v5: grep nonexistent -- subdir     0.02(0.01+0.01)   0.03(0.01+0.01) +50.0%
> 0003.19: v5: ls-files -- subdir             0.02(0.00+0.02)   0.02(0.01+0.01) +0.0%
> 0003.22: v5 update-index                    0.53(0.29+0.23)   0.02(0.01+0.01) -96.2%
>
> Given this, I don't think a complete change of the in-core format for
> the cache-entries is necessary to take full advantage of the new index
> file format.  Instead some changes to the current in-core format would
> work well with the new on-disk format.
>
> The current in-memory format fits the internal needs of git fairly well,
> so I don't think changing it to fit a better index file format would
> make a lot of sense, given that we can take advantage of the new format
> with the existing in-memory format.

Any more opinions on this series?  I've applied the changes suggested by
Duy, Antoine and Eric locally, but I wouldn't want to spam the list with
the whole series without a chance of this being applied.  How do you
want me to proceed?

> This series doesn't use kb/fast-hashmap yet, but that should be fairly
> simple to change if the series is deemed a good change.  The
> performance tests for update-index test require
> tg/perf-lib-test-perf-cleanup.
>
> Other changes, made following the review comments are:
>
> documentation: add documentation of the index-v5 file format
>   - Update documentation that directory flags are now 32-bits.  That
>     makes aligned access simpler
>   - offset_to_offset is no longer included in the checksum for files.
>     It's unnecessary.
>
> read-cache: read index-v5
>   - Add fix for reading with different level pathspecs given
>   - Use init_directory_entry to initialize all fields in a new
>     directory entry
>   - use memset to simplify the create_new_conflict function
>   - Add comments to explain -5 when reading directories and files
>   - Add comments for the more complex functions
>   - Add name flex_array to the end of ondisk_directory_entry for
>     simplified reading
>   - Add name flex_array to the end of ondisk_cache_entry for
>     simplified reading
>   - Move conflict reading functions to next patch
>   - mark functions as static when they are
>
> read-cache: read resolve-undo data
>   - Add comments for the more complex function
>   - Read conflicts + resolve undo data as extension
>
> read-cache: read cache-tree in index-v5
>   - Add comments for the more complex function
>   - Instead of sorting the directory entries, sort the cache-tree
>     directly.  This also required changing the algorithms with which
>     the cache entries are extracted from the directory tree.
>
> read-cache: write index-v5
>   - Free pointers allocated by super_directory
>   - Rewrite condition as suggested by Duy
>   - Don't check for CE_REMOVE'd entries in the writing code, they are
>     already checked in the compile_directory_data code
>   - Remove overly complicated directory size calculation since flags
>     are now 32-bits
>
> read-cache: write resolve-undo data for index-v5
>   - Free pointers allocated by super_directory
>   - Write conflicts + resolve undo data as extension
>
> introduce GIT_INDEX_VERSION environment variable
>   - Add documentation for GIT_INDEX_VERSION
>
> test-lib: allow setting the index format version
>
> Removed commits:
>   - read-cache: don't check uid, gid, ino
>   - read-cache: use fixed width integer types (independently in pu)
>   - read-cache: clear version in discard_index()
>
> Typos fixed as suggested by Eric Sunshine
>
> Thomas Gummerer (22):
>   read-cache: split index file version specific functionality
>   read-cache: move index v2 specific functions to their own file
>   read-cache: Re-read index if index file changed
>   add documentation for the index api
>   read-cache: add index reading api
>   make sure partially read index is not changed
>   grep.c: use index api
>   ls-files.c: use index api
>   documentation: add documentation of the index-v5 file format
>   read-cache: make in-memory format aware of stat_crc
>   read-cache: read index-v5
>   read-cache: read resolve-undo data
>   read-cache: read cache-tree in index-v5
>   read-cache: write index-v5
>   read-cache: write index-v5 cache-tree data
>   read-cache: write resolve-undo data for index-v5
>   update-index.c: rewrite index when index-version is given
>   introduce GIT_INDEX_VERSION environment variable
>   test-lib: allow setting the index format version
>   t1600: add index v5 specific tests
>   POC for partial writing
>   perf: add partial writing test
>
> Thomas Rast (1):
>   p0003-index.sh: add perf test for the index formats
>
>  Documentation/git.txt                            |    5 +
>  Documentation/technical/api-in-core-index.txt    |   56 +-
>  Documentation/technical/index-file-format-v5.txt |  294 +++++
>  Makefile                                         |   10 +
>  builtin/apply.c                                  |    2 +
>  builtin/grep.c                                   |   69 +-
>  builtin/ls-files.c                               |   36 +-
>  builtin/update-index.c                           |   50 +-
>  cache-tree.c                                     |   15 +-
>  cache-tree.h                                     |    2 +
>  cache.h                                          |  115 +-
>  lockfile.c                                       |    2 +-
>  read-cache-v2.c                                  |  561 +++++++++
>  read-cache-v5.c                                  | 1406 ++++++++++++++++++++++
>  read-cache.c                                     |  691 +++--------
>  read-cache.h                                     |   67 ++
>  resolve-undo.c                                   |    1 +
>  t/perf/p0003-index.sh                            |   74 ++
>  t/t1600-index-v5.sh                              |   25 +
>  t/t2101-update-index-reupdate.sh                 |   12 +-
>  t/test-lib-functions.sh                          |    5 +
>  t/test-lib.sh                                    |    3 +
>  test-index-version.c                             |    6 +
>  unpack-trees.c                                   |    3 +-
>  24 files changed, 2921 insertions(+), 589 deletions(-)
>  create mode 100644 Documentation/technical/index-file-format-v5.txt
>  create mode 100644 read-cache-v2.c
>  create mode 100644 read-cache-v5.c
>  create mode 100644 read-cache.h
>  create mode 100755 t/perf/p0003-index.sh
>  create mode 100755 t/t1600-index-v5.sh
>
> --
> 1.8.4.2
>

--
Thomas

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-12-09 10:14 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
2013-11-30  9:17   ` Duy Nguyen
2013-11-30 10:30     ` Thomas Gummerer
2013-11-30 15:39   ` Antoine Pelisse
2013-11-30 20:08     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
2013-11-30  9:17   ` Duy Nguyen
2013-11-30 10:40     ` Thomas Gummerer
2013-11-30 12:19   ` Antoine Pelisse
2013-11-30 20:10     ` Thomas Gummerer
2013-11-30 15:26   ` Antoine Pelisse
2013-11-30 20:27     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-11-27 21:57   ` Eric Sunshine
2013-11-27 22:08     ` Junio C Hamano
2013-11-28  9:57       ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
2013-11-30  9:58   ` Duy Nguyen
2013-11-30 10:50     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
2013-12-09 10:14 ` [PATCH v4 00/24] Index-v5 Thomas Gummerer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.