All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Resurrect rr/svn-export
@ 2010-07-15 16:22 Ramkumar Ramachandra
  2010-07-15 16:22 ` [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp Ramkumar Ramachandra
                   ` (8 more replies)
  0 siblings, 9 replies; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:22 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

Hi,

I've decided to get this series merged now instead of waiting for the
ternary treap refactor. It's in excellent shape, thanks to Jonathan's
constant reviews/ fixes and David's constant refactoring. Since
Jonathan's last send, I've incoporated some suggestions from my own
review of Jonathan's series and split it into two series as I proposed
earlier; I'll send the second series (rr/contrib-svn-fe) shortly after
this.

Once this series is merged, I estimate that the following will come in
as incremental patches when the work is finished:
rr/ternary-trp-refactor
rr/zero-tree-refactor
rr/dumpfilev3-parser

Thanks.

-- Ram

David Barr (5):
  Add memory pool library
  Add string-specific memory pool
  Add stream helper library
  Add infrastructure to write revisions in fast-export format
  Add SVN dump parser

Jason Evans (1):
  Add treap implementation

Jonathan Nieder (2):
  Export parse_date_basic() to convert a date string to timestamp
  Introduce vcs-svn lib

 Makefile              |   12 ++-
 cache.h               |    1 +
 date.c                |   14 +--
 vcs-svn/LICENSE       |   33 +++++
 vcs-svn/fast_export.c |   75 +++++++++++
 vcs-svn/fast_export.h |   14 ++
 vcs-svn/line_buffer.c |   93 ++++++++++++++
 vcs-svn/line_buffer.h |   14 ++
 vcs-svn/obj_pool.h    |   80 ++++++++++++
 vcs-svn/repo_tree.c   |  335 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   26 ++++
 vcs-svn/string_pool.c |  114 +++++++++++++++++
 vcs-svn/string_pool.h |   15 +++
 vcs-svn/svndump.c     |  289 ++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h     |    8 ++
 vcs-svn/trp.h         |  220 ++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt       |  102 +++++++++++++++
 17 files changed, 1436 insertions(+), 9 deletions(-)
 create mode 100644 vcs-svn/LICENSE
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create mode 100644 vcs-svn/obj_pool.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
@ 2010-07-15 16:22 ` Ramkumar Ramachandra
  2010-07-15 17:25   ` Jonathan Nieder
  2010-07-15 16:22 ` [PATCH 2/8] Introduce vcs-svn lib Ramkumar Ramachandra
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:22 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: Jonathan Nieder <jrnieder@gmail.com>

approxidate() is not appropriate for reading machine-written dates
because it guesses instead of erroring out on malformed dates.
parse_date() is less convenient since it returns its output as a
string.  So export the underlying function that writes a timestamp.

While at it, change the return value to match the usual convention:
return 0 for success and -1 for failure.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 cache.h |    1 +
 date.c  |   14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..68258be 100644
--- a/cache.h
+++ b/cache.h
@@ -811,6 +811,7 @@ const char *show_date_relative(unsigned long time, int tz,
 			       char *timebuf,
 			       size_t timebuf_size);
 int parse_date(const char *date, char *buf, int bufsize);
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset);
 void datestamp(char *buf, int bufsize);
 #define approxidate(s) approxidate_careful((s), NULL)
 unsigned long approxidate_careful(const char *, int *);
diff --git a/date.c b/date.c
index 3c981f7..00f9eb5 100644
--- a/date.c
+++ b/date.c
@@ -586,7 +586,7 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
 
 /* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
    (i.e. English) day/month names, and it doesn't work correctly with %z. */
-int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
 {
 	struct tm tm;
 	int tm_gmt;
@@ -642,17 +642,16 @@ int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
 
 	if (!tm_gmt)
 		*timestamp -= *offset * 60;
-	return 1; /* success */
+	return 0; /* success */
 }
 
 int parse_date(const char *date, char *result, int maxlen)
 {
 	unsigned long timestamp;
 	int offset;
-	if (parse_date_toffset(date, &timestamp, &offset) > 0)
-		return date_string(timestamp, offset, result, maxlen);
-	else
+	if (parse_date_basic(date, &timestamp, &offset))
 		return -1;
+	return date_string(timestamp, offset, result, maxlen);
 }
 
 enum date_mode parse_date_format(const char *format)
@@ -1004,9 +1003,8 @@ unsigned long approxidate_relative(const char *date, const struct timeval *tv)
 	int offset;
 	int errors = 0;
 
-	if (parse_date_toffset(date, &timestamp, &offset) > 0)
+	if (!parse_date_basic(date, &timestamp, &offset))
 		return timestamp;
-
 	return approxidate_str(date, tv, &errors);
 }
 
@@ -1019,7 +1017,7 @@ unsigned long approxidate_careful(const char *date, int *error_ret)
 	if (!error_ret)
 		error_ret = &dummy;
 
-	if (parse_date_toffset(date, &timestamp, &offset) > 0) {
+	if (!parse_date_basic(date, &timestamp, &offset)) {
 		*error_ret = 0;
 		return timestamp;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/8] Introduce vcs-svn lib
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
  2010-07-15 16:22 ` [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp Ramkumar Ramachandra
@ 2010-07-15 16:22 ` Ramkumar Ramachandra
  2010-07-15 17:46   ` Jonathan Nieder
  2010-07-15 16:22 ` [PATCH 3/8] Add memory pool library Ramkumar Ramachandra
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:22 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: Jonathan Nieder <jrnieder@gmail.com>

Teach the build system to build a separate library for the
upcoming subversion interop support.

The resulting vcs-svn/lib.a does not contain any code, nor is
it built during a normal build.  This is just scaffolding for
later changes.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Makefile |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 9aca8a1..6441dcb 100644
--- a/Makefile
+++ b/Makefile
@@ -468,6 +468,7 @@ export PYTHON_PATH
 
 LIB_FILE=libgit.a
 XDIFF_LIB=xdiff/lib.a
+VCSSVN_LIB=vcs-svn/lib.a
 
 LIB_H += advice.h
 LIB_H += archive.h
@@ -1739,7 +1740,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS)
+VCSSVN_OBJS =
+OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
 dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS))))
@@ -1860,6 +1862,8 @@ http.o http-walker.o http-push.o remote-curl.o: http.h
 xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
+
+$(VCSSVN_OBJS):
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -1908,6 +1912,8 @@ $(LIB_FILE): $(LIB_OBJS)
 $(XDIFF_LIB): $(XDIFF_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(XDIFF_OBJS)
 
+$(VCSSVN_LIB): $(VCSSVN_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(VCSSVN_OBJS)
 
 doc:
 	$(MAKE) -C Documentation all
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 3/8] Add memory pool library
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
  2010-07-15 16:22 ` [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp Ramkumar Ramachandra
  2010-07-15 16:22 ` [PATCH 2/8] Introduce vcs-svn lib Ramkumar Ramachandra
@ 2010-07-15 16:22 ` Ramkumar Ramachandra
  2010-07-15 18:57   ` Jonathan Nieder
  2010-07-15 16:23 ` [PATCH 4/8] Add treap implementation Ramkumar Ramachandra
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:22 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

Add a memory pool library implemented using C macros. The obj_pool_gen()
macro creates a type-specific memory pool API.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This allows the memory pool to persist between runs of an application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs, and for now it is simplest to avoid relying
on the target VCS for that.

obj_pool_gen(pre, obj_t, initial_capability)

	pre: Prefix for generated functions (example: string).
	obj_t: Type for treap data structure (example: char).
	initial_capacity: Initial size of the memory pool (example: 4096).

void pre_init(void);

	Read values from a previous run to initialize the pool.
	If this function is not called, the pool begins valid but empty.

uint32_t pre_alloc(uint32_t nmemb);

	Reserve space for a few objects in the pool and return an
	offset to the first one.

uint32_t pre_free(uint32_t nmemb);

	Unreserve the last few objects reserved.

uint32_t pre_offset(obj_t *pointer);
obj_t *pre_pointer(uint32_t offset);

	Convert between pointers into the in-memory pool and offsets
	from the beginning (or ~0 for the NULL pointer).  Pointers are
	not guaranteed to remain valid after a pre_alloc() operation
	or pre_reset() followed by pre_init(), but offsets are.

void pre_commit(void);

	Write the pool to file.  A pre_reset() followed by pre_init()
	(pehaps with exit() in between) will return the pool to the
	last committed state.

void pre_reset(void);

	Deinitialize the pool, freeing any associated memory and
	file handles.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile           |    3 +-
 vcs-svn/LICENSE    |   26 +++++++++++++++++
 vcs-svn/obj_pool.h |   80 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 1 deletions(-)
 create mode 100644 vcs-svn/LICENSE
 create mode 100644 vcs-svn/obj_pool.h

diff --git a/Makefile b/Makefile
index 6441dcb..fc31ee0 100644
--- a/Makefile
+++ b/Makefile
@@ -1863,7 +1863,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
-$(VCSSVN_OBJS):
+$(VCSSVN_OBJS): \
+	vcs-svn/obj_pool.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
new file mode 100644
index 0000000..6e52372
--- /dev/null
+++ b/vcs-svn/LICENSE
@@ -0,0 +1,26 @@
+Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice(s), this list of conditions and the following disclaimer
+   unmodified other than the allowable addition of one or more
+   copyright notices.
+2. Redistributions in binary form must reproduce the above copyright
+   notice(s), this list of conditions and the following disclaimer in
+   the documentation and/or other materials provided with the
+   distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
new file mode 100644
index 0000000..f60c872
--- /dev/null
+++ b/vcs-svn/obj_pool.h
@@ -0,0 +1,80 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#ifndef OBJ_POOL_H_
+#define OBJ_POOL_H_
+
+#include "git-compat-util.h"
+
+#define MAYBE_UNUSED __attribute__((__unused__))
+
+#define obj_pool_gen(pre, obj_t, initial_capacity) \
+static struct { \
+	uint32_t committed; \
+	uint32_t size; \
+	uint32_t capacity; \
+	obj_t *base; \
+	FILE *file; \
+} pre##_pool = { 0, 0, 0, NULL, NULL}; \
+static MAYBE_UNUSED void pre##_init(void) \
+{ \
+	struct stat st; \
+	pre##_pool.file = fopen(#pre ".bin", "a+"); \
+	rewind(pre##_pool.file); \
+	fstat(fileno(pre##_pool.file), &st); \
+	pre##_pool.size = st.st_size / sizeof(obj_t); \
+	pre##_pool.committed = pre##_pool.size; \
+	pre##_pool.capacity = pre##_pool.size * 2; \
+	if (pre##_pool.capacity < initial_capacity) \
+		pre##_pool.capacity = initial_capacity; \
+	pre##_pool.base = malloc(pre##_pool.capacity * sizeof(obj_t)); \
+	fread(pre##_pool.base, sizeof(obj_t), pre##_pool.size, pre##_pool.file); \
+} \
+static MAYBE_UNUSED uint32_t pre##_alloc(uint32_t count) \
+{ \
+	uint32_t offset; \
+	if (pre##_pool.size + count > pre##_pool.capacity) { \
+		while (pre##_pool.size + count > pre##_pool.capacity) \
+			if (pre##_pool.capacity) \
+				pre##_pool.capacity *= 2; \
+			else \
+				pre##_pool.capacity = initial_capacity; \
+		pre##_pool.base = realloc(pre##_pool.base, \
+					pre##_pool.capacity * sizeof(obj_t)); \
+	} \
+	offset = pre##_pool.size; \
+	pre##_pool.size += count; \
+	return offset; \
+} \
+static MAYBE_UNUSED void pre##_free(uint32_t count) \
+{ \
+	pre##_pool.size -= count; \
+} \
+static MAYBE_UNUSED uint32_t pre##_offset(obj_t *obj) \
+{ \
+	return obj == NULL ? ~0 : obj - pre##_pool.base; \
+} \
+static MAYBE_UNUSED obj_t *pre##_pointer(uint32_t offset) \
+{ \
+	return offset >= pre##_pool.size ? NULL : &pre##_pool.base[offset]; \
+} \
+static MAYBE_UNUSED void pre##_commit(void) \
+{ \
+	pre##_pool.committed += fwrite(pre##_pool.base + pre##_pool.committed, \
+		sizeof(obj_t), pre##_pool.size - pre##_pool.committed, \
+		pre##_pool.file); \
+} \
+static MAYBE_UNUSED void pre##_reset(void) \
+{ \
+	free(pre##_pool.base); \
+	if (pre##_pool.file) \
+		fclose(pre##_pool.file); \
+	pre##_pool.base = NULL; \
+	pre##_pool.size = 0; \
+	pre##_pool.capacity = 0; \
+	pre##_pool.file = NULL; \
+}
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 4/8] Add treap implementation
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
                   ` (2 preceding siblings ...)
  2010-07-15 16:22 ` [PATCH 3/8] Add memory pool library Ramkumar Ramachandra
@ 2010-07-15 16:23 ` Ramkumar Ramachandra
  2010-07-15 19:09   ` Jonathan Nieder
  2010-07-15 16:23 ` [PATCH 5/8] Add string-specific memory pool Ramkumar Ramachandra
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:23 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: Jason Evans <jasone@canonware.com>

Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile        |    2 +-
 vcs-svn/LICENSE |    3 +
 vcs-svn/trp.h   |  220 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt |  102 +++++++++++++++++++++++++
 4 files changed, 326 insertions(+), 1 deletions(-)
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

diff --git a/Makefile b/Makefile
index fc31ee0..663a366 100644
--- a/Makefile
+++ b/Makefile
@@ -1864,7 +1864,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index 6e52372..a3d384c 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -1,6 +1,9 @@
 Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
 All rights reserved.
 
+Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
+All rights reserved.
+
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
new file mode 100644
index 0000000..dd7d5ee
--- /dev/null
+++ b/vcs-svn/trp.h
@@ -0,0 +1,220 @@
+/*
+ * C macro implementation of treaps.
+ *
+ * Usage:
+ *   #include <stdint.h>
+ *   #include "trp.h"
+ *   trp_gen(...)
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#ifndef TRP_H_
+#define TRP_H_
+
+#define MAYBE_UNUSED __attribute__((__unused__))
+
+/* Node structure. */
+struct trp_node {
+	uint32_t trpn_left;
+	uint32_t trpn_right;
+};
+
+/* Root structure. */
+struct trp_root {
+	uint32_t trp_root;
+};
+
+/* Pointer/Offset conversion. */
+#define trpn_pointer(a_base, a_offset) (a_base##_pointer(a_offset))
+#define trpn_offset(a_base, a_pointer) (a_base##_offset(a_pointer))
+#define trpn_modify(a_base, a_offset) \
+	do { \
+		if ((a_offset) < a_base##_pool.committed) { \
+			uint32_t old_offset = (a_offset);\
+			(a_offset) = a_base##_alloc(1); \
+			*trpn_pointer(a_base, a_offset) = \
+				*trpn_pointer(a_base, old_offset); \
+		} \
+	} while (0);
+
+/* Left accessors. */
+#define trp_left_get(a_base, a_field, a_node) \
+	(trpn_pointer(a_base, a_node)->a_field.trpn_left)
+#define trp_left_set(a_base, a_field, a_node, a_left) \
+	do { \
+		trpn_modify(a_base, a_node); \
+		trp_left_get(a_base, a_field, a_node) = (a_left); \
+	} while(0)
+
+/* Right accessors. */
+#define trp_right_get(a_base, a_field, a_node) \
+	(trpn_pointer(a_base, a_node)->a_field.trpn_right)
+#define trp_right_set(a_base, a_field, a_node, a_right) \
+	do { \
+		trpn_modify(a_base, a_node); \
+		trp_right_get(a_base, a_field, a_node) = (a_right); \
+	} while(0)
+
+/*
+ * Fibonacci hash function.
+ * The multiplier is the nearest prime to (2^32 times (√5 - 1)/2).
+ * See Knuth §6.4: volume 3, 3rd ed, p518.
+ */
+#define trpn_hash(a_node) (uint32_t) (2654435761u * (a_node))
+
+/* Priority accessors. */
+#define trp_prio_get(a_node) trpn_hash(a_node)
+
+/* Node initializer. */
+#define trp_node_new(a_base, a_field, a_node) \
+	do { \
+		trp_left_set(a_base, a_field, (a_node), ~0); \
+		trp_right_set(a_base, a_field, (a_node), ~0); \
+	} while(0)
+
+/* Internal utility macros. */
+#define trpn_first(a_base, a_field, a_root, r_node) \
+	do { \
+		(r_node) = (a_root); \
+		if ((r_node) == ~0) \
+			return NULL; \
+		while (~trp_left_get(a_base, a_field, (r_node))) \
+			(r_node) = trp_left_get(a_base, a_field, (r_node)); \
+	} while (0)
+
+#define trpn_rotate_left(a_base, a_field, a_node, r_node) \
+	do { \
+		(r_node) = trp_right_get(a_base, a_field, (a_node)); \
+		trp_right_set(a_base, a_field, (a_node), \
+			trp_left_get(a_base, a_field, (r_node))); \
+		trp_left_set(a_base, a_field, (r_node), (a_node)); \
+	} while(0)
+
+#define trpn_rotate_right(a_base, a_field, a_node, r_node) \
+	do { \
+		(r_node) = trp_left_get(a_base, a_field, (a_node)); \
+		trp_left_set(a_base, a_field, (a_node), \
+			trp_right_get(a_base, a_field, (r_node))); \
+		trp_right_set(a_base, a_field, (r_node), (a_node)); \
+	} while(0)
+
+#define trp_gen(a_attr, a_pre, a_type, a_field, a_base, a_cmp) \
+a_attr a_type MAYBE_UNUSED *a_pre##first(struct trp_root *treap) \
+{ \
+	uint32_t ret; \
+	trpn_first(a_base, a_field, treap->trp_root, ret); \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##next(struct trp_root *treap, a_type *node) \
+{ \
+	uint32_t ret; \
+	uint32_t offset = trpn_offset(a_base, node); \
+	if (~trp_right_get(a_base, a_field, offset)) { \
+		trpn_first(a_base, a_field, \
+			trp_right_get(a_base, a_field, offset), ret); \
+	} else { \
+		uint32_t tnode = treap->trp_root; \
+		ret = ~0; \
+		while (1) { \
+			int cmp = (a_cmp)(trpn_pointer(a_base, offset), \
+				trpn_pointer(a_base, tnode)); \
+			if (cmp < 0) { \
+				ret = tnode; \
+				tnode = trp_left_get(a_base, a_field, tnode); \
+			} else if (cmp > 0) { \
+				tnode = trp_right_get(a_base, a_field, tnode); \
+			} else { \
+				break; \
+			} \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
+{ \
+	int cmp; \
+	uint32_t ret = treap->trp_root; \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+		if (cmp < 0) \
+			ret = trp_left_get(a_base, a_field, ret); \
+		else \
+			ret = trp_right_get(a_base, a_field, ret); \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr uint32_t MAYBE_UNUSED a_pre##insert_recurse(uint32_t cur_node, uint32_t ins_node) \
+{ \
+	if (cur_node == ~0) { \
+		return (ins_node); \
+	} else { \
+		uint32_t ret; \
+		int cmp = (a_cmp)(trpn_pointer(a_base, ins_node), \
+					trpn_pointer(a_base, cur_node)); \
+		if (cmp < 0) { \
+			uint32_t left = a_pre##insert_recurse( \
+				trp_left_get(a_base, a_field, cur_node), ins_node); \
+			trp_left_set(a_base, a_field, cur_node, left); \
+			if (trp_prio_get(left) < trp_prio_get(cur_node)) \
+				trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} else { \
+			uint32_t right = a_pre##insert_recurse( \
+				trp_right_get(a_base, a_field, cur_node), ins_node); \
+			trp_right_set(a_base, a_field, cur_node, right); \
+			if (trp_prio_get(right) < trp_prio_get(cur_node)) \
+				trpn_rotate_left(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} \
+		return (ret); \
+	} \
+} \
+a_attr void MAYBE_UNUSED a_pre##insert(struct trp_root *treap, a_type *node) \
+{ \
+	uint32_t offset = trpn_offset(a_base, node); \
+	trp_node_new(a_base, a_field, offset); \
+	treap->trp_root = a_pre##insert_recurse(treap->trp_root, offset); \
+} \
+a_attr uint32_t MAYBE_UNUSED a_pre##remove_recurse(uint32_t cur_node, uint32_t rem_node) \
+{ \
+	int cmp = a_cmp(trpn_pointer(a_base, rem_node), \
+			trpn_pointer(a_base, cur_node)); \
+	if (cmp == 0) { \
+		uint32_t ret; \
+		uint32_t left = trp_left_get(a_base, a_field, cur_node); \
+		uint32_t right = trp_right_get(a_base, a_field, cur_node); \
+		if (left == ~0) { \
+			if (right == ~0) \
+				return (~0); \
+		} else if (right == ~0 || trp_prio_get(left) < trp_prio_get(right)) { \
+			trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			right = a_pre##remove_recurse(cur_node, rem_node); \
+			trp_right_set(a_base, a_field, ret, right); \
+			return (ret); \
+		} \
+		trpn_rotate_left(a_base, a_field, cur_node, ret); \
+		left = a_pre##remove_recurse(cur_node, rem_node); \
+		trp_left_set(a_base, a_field, ret, left); \
+		return (ret); \
+	} else if (cmp < 0) { \
+		uint32_t left = a_pre##remove_recurse( \
+			trp_left_get(a_base, a_field, cur_node), rem_node); \
+		trp_left_set(a_base, a_field, cur_node, left); \
+		return (cur_node); \
+	} else { \
+		uint32_t right = a_pre##remove_recurse( \
+			trp_right_get(a_base, a_field, cur_node), rem_node); \
+		trp_right_set(a_base, a_field, cur_node, right); \
+		return (cur_node); \
+	} \
+} \
+a_attr void MAYBE_UNUSED a_pre##remove(struct trp_root *treap, a_type *node) \
+{ \
+	treap->trp_root = a_pre##remove_recurse(treap->trp_root, \
+		trpn_offset(a_base, node)); \
+} \
+
+#endif
diff --git a/vcs-svn/trp.txt b/vcs-svn/trp.txt
new file mode 100644
index 0000000..943c385
--- /dev/null
+++ b/vcs-svn/trp.txt
@@ -0,0 +1,102 @@
+Motivation
+==========
+
+Treaps provide a memory-efficient binary search tree structure.
+Insertion/deletion/search are about as about as fast in the average
+case as red-black trees and the chances of worst-case behavior are
+vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
+behavior is a small price to pay, given that treaps are much simpler
+to implement.
+    
+From http://www.canonware.com/download/trp/trp_hash/trp.h
+
+API
+===
+
+The trp API generates a data structure and functions to handle a
+large growing set of objects stored in a pool.
+
+The caller:
+
+. Specifies parameters for the generated functions with the
+  trp_gen(static, foo_, ...) macro.
+
+. Allocates and clears a `struct trp_node` variable.
+
+. Adds new items to the set using `foo_insert`.
+
+. Can find a specific item in the set using `foo_search`.
+
+. Can iterate over items in the set using `foo_first` and `foo_next`.
+
+. Can remove an item from the set using `foo_remove`.
+
+. The set is never freed.
+
+Example:
+
+----
+struct ex_node {
+	const char *s;
+	struct trp_node ex_link;
+};
+static struct trp_root ex_base;
+obj_pool_gen(ex, struct ex_node, 4096);
+trp_gen(static, ex_, struct ex_node, ex_link, ex, strcmp)
+struct ex_node *item;
+
+item = ex_pointer(ex_alloc(1));
+item->s = "hello";
+ex_insert(&ex_base, item);
+item = ex_pointer(ex_alloc(1));
+item->s = "goodbye";
+ex_insert(&ex_base, item);
+for (item = ex_first(&ex_base); item; item = ex_next(&ex_base, item))
+	printf("%s\n", item->s);
+----
+
+Functions
+---------
+
+trp_gen(attr, foo_, node_type, link_field, pool, cmp)::
+
+	Generate a type-specific treap implementation.
++
+. The storage class for generated functions will be 'attr' (e.g., `static`).
+. Generated function names are prefixed with 'foo_' (e.g., `treap_`).
+. Treap nodes will be of type 'node_type' (e.g., `struct treap_node`).
+  This type must be a struct with at least one `struct trp_node` field
+  to point to its children.
+. The field used to access child nodes will be 'link_field'.
+. All treap nodes must lie in the 'pool' object pool.
+. Treap nodes must be totally ordered by the 'cmp' relation, with the
+  following prototype:
++
+int (*cmp)(node_type \*a, node_type \*b)
++
+and returning a value less than, equal to, or greater than zero
+according to the result of comparison.
+
+void foo_insert(struct trp_root *treap, node_type \*node)::
+
+	Insert node into treap.  If inserted multiple times,
+	a node will appear in the treap multiple times.
+
+void foo_remove(struct trp_root *treap, node_type \*node)::
+
+	Remove node from treap.  Caller must ensure node is
+	present in treap before using this function.
+
+node_type *foo_search(struct trp_root \*treap, node_type \*key)::
+
+	Search for a node that matches key.  If no match is found,
+	return what would be key's successor, were key in treap
+	(NULL if no successor).
+
+node_type *foo_first(struct trp_root \*treap)::
+
+	Find the first item from the treap, in sorted order.
+
+node_type *foo_next(struct trp_root \*treap, node_type \*node)::
+
+	Find the next item.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 5/8] Add string-specific memory pool
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
                   ` (3 preceding siblings ...)
  2010-07-15 16:23 ` [PATCH 4/8] Add treap implementation Ramkumar Ramachandra
@ 2010-07-15 16:23 ` Ramkumar Ramachandra
  2010-07-15 16:23 ` [PATCH 6/8] Add stream helper library Ramkumar Ramachandra
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:23 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

Intern strings so they can be compared by address and stored without
wasting space.

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile              |    4 +-
 vcs-svn/string_pool.c |  114 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/string_pool.h |   15 ++++++
 3 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h

diff --git a/Makefile b/Makefile
index 663a366..e11e588 100644
--- a/Makefile
+++ b/Makefile
@@ -1740,7 +1740,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS =
+VCSSVN_OBJS = vcs-svn/string_pool.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1864,7 +1864,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h vcs-svn/trp.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
new file mode 100644
index 0000000..bd5a380
--- /dev/null
+++ b/vcs-svn/string_pool.c
@@ -0,0 +1,114 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "trp.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+static struct trp_root tree = { ~0 };
+
+struct node {
+	uint32_t offset;
+	struct trp_node children;
+};
+
+/* Two memory pools: one for struct node, and another for strings */
+obj_pool_gen(node, struct node, 4096);
+obj_pool_gen(string, char, 4096);
+
+static char *node_value(struct node *node)
+{
+	return node ? string_pointer(node->offset) : NULL;
+}
+
+static int node_cmp(struct node *a, struct node *b)
+{
+	return strcmp(node_value(a), node_value(b));
+}
+
+/* Build a Treap from the node structure (a trp_node w/ offset) */
+trp_gen(static, tree_, struct node, children, node, node_cmp);
+
+char *pool_fetch(uint32_t entry)
+{
+	return node_value(node_pointer(entry));
+}
+
+uint32_t pool_intern(char *key)
+{
+	/* Canonicalize key */
+	struct node *match = NULL;
+	uint32_t key_len;
+	if (key == NULL)
+		return ~0;
+	key_len = strlen(key) + 1;
+	struct node *node = node_pointer(node_alloc(1));
+	node->offset = string_alloc(key_len);
+	strcpy(node_value(node), key);
+	match = tree_search(&tree, node);
+	if (!match) {
+		tree_insert(&tree, node);
+	} else {
+		node_free(1);
+		string_free(key_len);
+		node = match;
+	}
+	return node_offset(node);
+}
+
+uint32_t pool_tok_r(char *str, const char *delim, char **saveptr)
+{
+	char *token = strtok_r(str, delim, saveptr);
+	return token ? pool_intern(token) : ~0;
+}
+
+void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream)
+{
+	uint32_t i;
+	for (i = 0; i < len && ~seq[i]; i++) {
+		fputs(pool_fetch(seq[i]), stream);
+		if (i < len - 1 && ~seq[i + 1])
+			fputc(delim, stream);
+	}
+}
+
+uint32_t pool_tok_seq(uint32_t max, uint32_t *seq, char *delim, char *str)
+{
+	char *context = NULL;
+	uint32_t length = 0, token = str ? pool_tok_r(str, delim, &context) : ~0;
+	while (length < max) {
+		seq[length++] = token;
+		if (token == ~0)
+			break;
+		token = pool_tok_r(NULL, delim, &context);
+	}
+	seq[length ? length - 1 : 0] = ~0;
+	return length;
+}
+
+void pool_init(void)
+{
+	uint32_t node;
+	uint32_t string = 0;
+	string_init();
+	while (string < string_pool.size) {
+		node = node_alloc(1);
+		node_pointer(node)->offset = string;
+		tree_insert(&tree, node_pointer(node));
+		string += strlen(string_pointer(string)) + 1;
+	}
+}
+
+void pool_commit(void)
+{
+	string_commit();
+}
+
+void pool_reset(void)
+{
+	node_reset();
+	string_reset();
+}
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
new file mode 100644
index 0000000..085e6d7
--- /dev/null
+++ b/vcs-svn/string_pool.h
@@ -0,0 +1,15 @@
+#ifndef STRING_POOL_H_
+#define STRING_POOL_H_
+
+#include "git-compat-util.h"
+
+uint32_t pool_intern(char *key);
+char *pool_fetch(uint32_t entry);
+uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
+void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream);
+uint32_t pool_tok_seq(uint32_t max, uint32_t *seq, char *delim, char *str);
+void pool_init(void);
+void pool_commit(void);
+void pool_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 6/8] Add stream helper library
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
                   ` (4 preceding siblings ...)
  2010-07-15 16:23 ` [PATCH 5/8] Add string-specific memory pool Ramkumar Ramachandra
@ 2010-07-15 16:23 ` Ramkumar Ramachandra
  2010-07-15 19:19   ` Jonathan Nieder
  2010-07-15 16:23 ` [PATCH 7/8] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:23 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

This library provides thread-unsafe fgets()- and fread()-like
functions where the caller does not have to supply a buffer.  It
maintains a couple of static buffers and provides an API to use
them.

NEEDSWORK: what should buffer_copy_bytes do on error?

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile              |    5 ++-
 vcs-svn/line_buffer.c |   93 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/line_buffer.h |   14 +++++++
 3 files changed, 110 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h

diff --git a/Makefile b/Makefile
index e11e588..8223d9b 100644
--- a/Makefile
+++ b/Makefile
@@ -1740,7 +1740,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/string_pool.o
+VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1864,7 +1864,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
+	vcs-svn/line_buffer.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
new file mode 100644
index 0000000..0f83426
--- /dev/null
+++ b/vcs-svn/line_buffer.c
@@ -0,0 +1,93 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+
+#include "line_buffer.h"
+#include "obj_pool.h"
+
+#define LINE_BUFFER_LEN 10000
+#define COPY_BUFFER_LEN 4096
+
+/* Create memory pool for char sequence of known length */
+obj_pool_gen(blob, char, 4096);
+
+static char line_buffer[LINE_BUFFER_LEN];
+static char byte_buffer[COPY_BUFFER_LEN];
+static FILE *infile;
+
+int buffer_init(const char *filename)
+{
+	infile = filename ? fopen(filename, "r") : stdin;
+	if (!infile)
+		return -1;
+	return 0;
+}
+
+int buffer_deinit()
+{
+	fclose(infile);
+	return 0;
+}
+
+/* Read a line without trailing newline. */
+char *buffer_read_line(void)
+{
+	char *end;
+	if (!fgets(line_buffer, sizeof(line_buffer), infile))
+		/* Error or data exhausted. */
+		return NULL;
+	end = line_buffer + strlen(line_buffer);
+	if (end[-1] == '\n')
+		end[-1] = '\0';
+	else if (feof(infile))
+		; /* No newline at end of file.  That's fine. */
+	else
+		/*
+		 * Line was too long.
+		 * There is probably a saner way to deal with this,
+		 * but for now let's return an error.
+		 */
+		return NULL;
+	return line_buffer;
+}
+
+char *buffer_read_string(uint32_t len)
+{
+	char *s;
+	blob_free(blob_pool.size);
+	s = blob_pointer(blob_alloc(len + 1));
+	s[fread(s, 1, len, infile)] = '\0';
+	return ferror(infile) ? NULL : s;
+}
+
+void buffer_copy_bytes(uint32_t len)
+{
+	uint32_t in;
+	while (len > 0 && !feof(infile)) {
+		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		in = fread(byte_buffer, 1, in, infile);
+		len -= in;
+		fwrite(byte_buffer, 1, in, stdout);
+		if (ferror(infile) || ferror(stdout))
+			/* NEEDSWORK: handle error. */
+			break;
+	}
+}
+
+void buffer_skip_bytes(uint32_t len)
+{
+	uint32_t in;
+	while (len > 0 && !feof(infile) && !ferror(infile)) {
+		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		in = fread(byte_buffer, 1, in, infile);
+		len -= in;
+	}
+}
+
+void buffer_reset(void)
+{
+	blob_reset();
+}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
new file mode 100644
index 0000000..631d1df
--- /dev/null
+++ b/vcs-svn/line_buffer.h
@@ -0,0 +1,14 @@
+#ifndef LINE_BUFFER_H_
+#define LINE_BUFFER_H_
+
+#include "git-compat-util.h"
+
+int buffer_init(const char *filename);
+int buffer_deinit(void);
+char *buffer_read_line(void);
+char *buffer_read_string(uint32_t len);
+void buffer_copy_bytes(uint32_t len);
+void buffer_skip_bytes(uint32_t len);
+void buffer_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 7/8] Add infrastructure to write revisions in fast-export format
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
                   ` (5 preceding siblings ...)
  2010-07-15 16:23 ` [PATCH 6/8] Add stream helper library Ramkumar Ramachandra
@ 2010-07-15 16:23 ` Ramkumar Ramachandra
  2010-07-15 19:28   ` Jonathan Nieder
  2010-07-15 16:23 ` [PATCH 8/8] Add SVN dump parser Ramkumar Ramachandra
  2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:23 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile              |    5 +-
 vcs-svn/fast_export.c |   75 +++++++++++
 vcs-svn/fast_export.h |   14 ++
 vcs-svn/repo_tree.c   |  335 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   26 ++++
 5 files changed, 453 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h

diff --git a/Makefile b/Makefile
index 8223d9b..7c66dcc 100644
--- a/Makefile
+++ b/Makefile
@@ -1740,7 +1740,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o
+VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1865,7 +1866,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
-	vcs-svn/line_buffer.h
+	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
new file mode 100644
index 0000000..7552803
--- /dev/null
+++ b/vcs-svn/fast_export.c
@@ -0,0 +1,75 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+
+#include "fast_export.h"
+#include "line_buffer.h"
+#include "repo_tree.h"
+#include "string_pool.h"
+
+#define MAX_GITSVN_LINE_LEN 4096
+
+static uint32_t first_commit_done;
+
+void fast_export_delete(uint32_t depth, uint32_t *path)
+{
+	putchar('D');
+	putchar(' ');
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+}
+
+void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
+                        uint32_t mark)
+{
+	/* Mode must be 100644, 100755, 120000, or 160000. */
+	printf("M %06o :%d ", mode, mark);
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+}
+
+static char gitsvnline[MAX_GITSVN_LINE_LEN];
+void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+			uint32_t uuid, uint32_t url,
+			unsigned long timestamp)
+{
+	if (!log)
+		log = "";
+	if (~uuid && ~url) {
+		snprintf(gitsvnline, MAX_GITSVN_LINE_LEN, "\n\ngit-svn-id: %s@%d %s\n",
+				 pool_fetch(url), revision, pool_fetch(uuid));
+	} else {
+		*gitsvnline = '\0';
+	}
+	printf("commit refs/heads/master\n");
+	printf("committer %s <%s@%s> %ld +0000\n",
+		   ~author ? pool_fetch(author) : "nobody",
+		   ~author ? pool_fetch(author) : "nobody",
+		   ~uuid ? pool_fetch(uuid) : "local", timestamp);
+	printf("data %zd\n%s%s\n",
+		   strlen(log) + strlen(gitsvnline), log, gitsvnline);
+	if (!first_commit_done) {
+		if (revision > 1)
+			printf("from refs/heads/master^0\n");
+		first_commit_done = 1;
+	}
+	repo_diff(revision - 1, revision);
+	fputc('\n', stdout);
+
+	printf("progress Imported commit %d.\n\n", revision);
+}
+
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)
+{
+	if (mode == REPO_MODE_LNK) {
+		/* svn symlink blobs start with "link " */
+		buffer_skip_bytes(5);
+		len -= 5;
+	}
+	printf("blob\nmark :%d\ndata %d\n", mark, len);
+	buffer_copy_bytes(len);
+	fputc('\n', stdout);
+}
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
new file mode 100644
index 0000000..47e8f56
--- /dev/null
+++ b/vcs-svn/fast_export.h
@@ -0,0 +1,14 @@
+#ifndef FAST_EXPORT_H_
+#define FAST_EXPORT_H_
+
+#include <stdint.h>
+#include <time.h>
+
+void fast_export_delete(uint32_t depth, uint32_t *path);
+void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
+			uint32_t mark);
+void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+			uint32_t uuid, uint32_t url, unsigned long timestamp);
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len);
+
+#endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
new file mode 100644
index 0000000..59a7434
--- /dev/null
+++ b/vcs-svn/repo_tree.c
@@ -0,0 +1,335 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+
+#include "string_pool.h"
+#include "repo_tree.h"
+#include "obj_pool.h"
+#include "fast_export.h"
+
+#include "trp.h"
+
+struct repo_dirent {
+	uint32_t name_offset;
+	struct trp_node children;
+	uint32_t mode;
+	uint32_t content_offset;
+};
+
+struct repo_dir {
+	struct trp_root entries;
+};
+
+struct repo_commit {
+	uint32_t root_dir_offset;
+};
+
+/* Memory pools for commit, dir and dirent */
+obj_pool_gen(commit, struct repo_commit, 4096);
+obj_pool_gen(dir, struct repo_dir, 4096);
+obj_pool_gen(dirent, struct repo_dirent, 4096);
+
+static uint32_t active_commit;
+static uint32_t mark;
+
+static int repo_dirent_name_cmp(const void *a, const void *b);
+
+/* Treap for directory entries */
+trp_gen(static, dirent_, struct repo_dirent, children, dirent, repo_dirent_name_cmp);
+
+uint32_t next_blob_mark(void)
+{
+	return mark++;
+}
+
+static struct repo_dir *repo_commit_root_dir(struct repo_commit *commit)
+{
+	return dir_pointer(commit->root_dir_offset);
+}
+
+static struct repo_dirent *repo_first_dirent(struct repo_dir *dir)
+{
+	return dirent_first(&dir->entries);
+}
+
+static int repo_dirent_name_cmp(const void *a, const void *b)
+{
+	const struct repo_dirent *dirent1 = a, *dirent2 = b;
+	uint32_t a_offset = dirent1->name_offset;
+	uint32_t b_offset = dirent2->name_offset;
+	return (a_offset > b_offset) - (a_offset < b_offset);
+}
+
+static int repo_dirent_is_dir(struct repo_dirent *dirent)
+{
+	return dirent != NULL && dirent->mode == REPO_MODE_DIR;
+}
+
+static struct repo_dir *repo_dir_from_dirent(struct repo_dirent *dirent)
+{
+	if (!repo_dirent_is_dir(dirent))
+		return NULL;
+	return dir_pointer(dirent->content_offset);
+}
+
+static struct repo_dir *repo_clone_dir(struct repo_dir *orig_dir)
+{
+	uint32_t orig_o, new_o;
+	orig_o = dir_offset(orig_dir);
+	if (orig_o >= dir_pool.committed)
+		return orig_dir;
+	new_o = dir_alloc(1);
+	orig_dir = dir_pointer(orig_o);
+	*dir_pointer(new_o) = *orig_dir;
+	return dir_pointer(new_o);
+}
+
+static struct repo_dirent *repo_read_dirent(uint32_t revision, uint32_t *path)
+{
+	uint32_t name = 0;
+	struct repo_dirent *key = dirent_pointer(dirent_alloc(1));
+	struct repo_dir *dir = NULL;
+	struct repo_dirent *dirent = NULL;
+	dir = repo_commit_root_dir(commit_pointer(revision));
+	while (~(name = *path++)) {
+		key->name_offset = name;
+		dirent = dirent_search(&dir->entries, key);
+		if (dirent == NULL || !repo_dirent_is_dir(dirent))
+			break;
+		dir = repo_dir_from_dirent(dirent);
+	}
+	dirent_free(1);
+	return dirent;
+}
+
+static void repo_write_dirent(uint32_t *path, uint32_t mode,
+                              uint32_t content_offset, uint32_t del)
+{
+	uint32_t name, revision, dir_o = ~0, parent_dir_o = ~0;
+	struct repo_dir *dir;
+	struct repo_dirent *key;
+	struct repo_dirent *dirent = NULL;
+	revision = active_commit;
+	dir = repo_commit_root_dir(commit_pointer(revision));
+	dir = repo_clone_dir(dir);
+	commit_pointer(revision)->root_dir_offset = dir_offset(dir);
+	while (~(name = *path++)) {
+		parent_dir_o = dir_offset(dir);
+
+		key = dirent_pointer(dirent_alloc(1));
+		key->name_offset = name;
+
+		dirent = dirent_search(&dir->entries, key);
+		if (dirent == NULL)
+			dirent = key;
+		else
+			dirent_free(1);
+
+		if (dirent == key) {
+			dirent->mode = REPO_MODE_DIR;
+			dirent->content_offset = 0;
+			dirent_insert(&dir->entries, dirent);
+		}
+
+		if (dirent_offset(dirent) < dirent_pool.committed) {
+			dir_o = repo_dirent_is_dir(dirent) ?
+					dirent->content_offset : ~0;
+			dirent_remove(&dir->entries, dirent);
+			dirent = dirent_pointer(dirent_alloc(1));
+			dirent->name_offset = name;
+			dirent->mode = REPO_MODE_DIR;
+			dirent->content_offset = dir_o;
+			dirent_insert(&dir->entries, dirent);
+		}
+
+		dir = repo_dir_from_dirent(dirent);
+		dir = repo_clone_dir(dir);
+		dirent->content_offset = dir_offset(dir);
+	}
+	if (dirent == NULL)
+		return;
+	dirent->mode = mode;
+	dirent->content_offset = content_offset;
+	if (del && ~parent_dir_o)
+		dirent_remove(&dir_pointer(parent_dir_o)->entries, dirent);
+}
+
+uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
+{
+	uint32_t mode = 0, content_offset = 0;
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(revision, src);
+	if (src_dirent != NULL) {
+		mode = src_dirent->mode;
+		content_offset = src_dirent->content_offset;
+		repo_write_dirent(dst, mode, content_offset, 0);
+	}
+	return mode;
+}
+
+void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
+{
+	repo_write_dirent(path, mode, blob_mark, 0);
+}
+
+uint32_t repo_replace(uint32_t *path, uint32_t blob_mark)
+{
+	uint32_t mode = 0;
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(active_commit, path);
+	if (src_dirent != NULL) {
+		mode = src_dirent->mode;
+		repo_write_dirent(path, mode, blob_mark, 0);
+	}
+	return mode;
+}
+
+void repo_modify(uint32_t *path, uint32_t mode, uint32_t blob_mark)
+{
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(active_commit, path);
+	if (src_dirent != NULL && blob_mark == 0)
+		blob_mark = src_dirent->content_offset;
+	repo_write_dirent(path, mode, blob_mark, 0);
+}
+
+void repo_delete(uint32_t *path)
+{
+	repo_write_dirent(path, 0, 0, 1);
+}
+
+static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir);
+
+static void repo_git_add(uint32_t depth, uint32_t *path, struct repo_dirent *dirent)
+{
+	if (repo_dirent_is_dir(dirent))
+		repo_git_add_r(depth, path, repo_dir_from_dirent(dirent));
+	else
+		fast_export_modify(depth, path,
+		                   dirent->mode, dirent->content_offset);
+}
+
+static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir)
+{
+	struct repo_dirent *de = repo_first_dirent(dir);
+	while (de) {
+		path[depth] = de->name_offset;
+		repo_git_add(depth + 1, path, de);
+		de = dirent_next(&dir->entries, de);
+	}
+}
+
+static void repo_diff_r(uint32_t depth, uint32_t *path, struct repo_dir *dir1,
+                        struct repo_dir *dir2)
+{
+	struct repo_dirent *de1, *de2;
+	de1 = repo_first_dirent(dir1);
+	de2 = repo_first_dirent(dir2);
+
+	while (de1 && de2) {
+		if (de1->name_offset < de2->name_offset) {
+			path[depth] = de1->name_offset;
+			fast_export_delete(depth + 1, path);
+			de1 = dirent_next(&dir1->entries, de1);
+			continue;
+		}
+		if (de1->name_offset > de2->name_offset) {
+			path[depth] = de2->name_offset;
+			repo_git_add(depth + 1, path, de2);
+			de2 = dirent_next(&dir2->entries, de2);
+			continue;
+		}
+		path[depth] = de1->name_offset;
+
+		if (de1->mode == de2->mode &&
+		    de1->content_offset == de2->content_offset) {
+			; /* No change. */
+		} else if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
+			repo_diff_r(depth + 1, path,
+				    repo_dir_from_dirent(de1),
+				    repo_dir_from_dirent(de2));
+		} else if (!repo_dirent_is_dir(de1) && !repo_dirent_is_dir(de2)) {
+			repo_git_add(depth + 1, path, de2);
+		} else {
+			fast_export_delete(depth + 1, path);
+			repo_git_add(depth + 1, path, de2);
+		}
+		de1 = dirent_next(&dir1->entries, de1);
+		de2 = dirent_next(&dir2->entries, de2);
+	}
+	while (de1) {
+		path[depth] = de1->name_offset;
+		fast_export_delete(depth + 1, path);
+		de1 = dirent_next(&dir1->entries, de1);
+	}
+	while (de2) {
+		path[depth] = de2->name_offset;
+		repo_git_add(depth + 1, path, de2);
+		de2 = dirent_next(&dir2->entries, de2);
+	}
+}
+
+static uint32_t path_stack[REPO_MAX_PATH_DEPTH];
+
+void repo_diff(uint32_t r1, uint32_t r2)
+{
+	repo_diff_r(0,
+	            path_stack,
+	            repo_commit_root_dir(commit_pointer(r1)),
+	            repo_commit_root_dir(commit_pointer(r2)));
+}
+
+void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
+                 uint32_t url, unsigned long timestamp)
+{
+	fast_export_commit(revision, author, log, uuid, url, timestamp);
+	pool_commit();
+	dirent_commit();
+	dir_commit();
+	commit_commit();
+	active_commit = commit_alloc(1);
+	commit_pointer(active_commit)->root_dir_offset =
+		commit_pointer(active_commit - 1)->root_dir_offset;
+}
+
+static void mark_init(void)
+{
+	uint32_t i;
+	mark = 0;
+	for (i = 0; i < dirent_pool.size; i++)
+		if (!repo_dirent_is_dir(dirent_pointer(i)) &&
+		    dirent_pointer(i)->content_offset > mark)
+			mark = dirent_pointer(i)->content_offset;
+	mark++;
+}
+
+void repo_init() {
+	pool_init();
+	commit_init();
+	dir_init();
+	dirent_init();
+	mark_init();
+	if (commit_pool.size == 0) {
+		/* Create empty tree for commit 0. */
+		commit_alloc(1);
+		commit_pointer(0)->root_dir_offset = dir_alloc(1);
+		dir_pointer(0)->entries.trp_root = ~0;
+		dir_commit();
+		commit_commit();
+	}
+	/* Preallocate next commit, ready for changes. */
+	active_commit = commit_alloc(1);
+	commit_pointer(active_commit)->root_dir_offset =
+		commit_pointer(active_commit - 1)->root_dir_offset;
+}
+
+void repo_reset(void)
+{
+	pool_reset();
+	commit_reset();
+	dir_reset();
+	dirent_reset();
+}
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
new file mode 100644
index 0000000..92a7a7b
--- /dev/null
+++ b/vcs-svn/repo_tree.h
@@ -0,0 +1,26 @@
+#ifndef REPO_TREE_H_
+#define REPO_TREE_H_
+
+#include "git-compat-util.h"
+
+#define REPO_MODE_DIR 0040000
+#define REPO_MODE_BLB 0100644
+#define REPO_MODE_EXE 0100755
+#define REPO_MODE_LNK 0120000
+
+#define REPO_MAX_PATH_LEN 4096
+#define REPO_MAX_PATH_DEPTH 1000
+
+uint32_t next_blob_mark(void);
+uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
+void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+uint32_t repo_replace(uint32_t *path, uint32_t blob_mark);
+void repo_modify(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+void repo_delete(uint32_t *path);
+void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
+                 uint32_t url, long unsigned timestamp);
+void repo_diff(uint32_t r1, uint32_t r2);
+void repo_init(void);
+void repo_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 8/8] Add SVN dump parser
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
                   ` (6 preceding siblings ...)
  2010-07-15 16:23 ` [PATCH 7/8] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
@ 2010-07-15 16:23 ` Ramkumar Ramachandra
  2010-07-15 19:52   ` Jonathan Nieder
  2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
  8 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 16:23 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile          |    5 +-
 vcs-svn/LICENSE   |    4 +
 vcs-svn/svndump.c |  289 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h |    8 ++
 4 files changed, 304 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h

diff --git a/Makefile b/Makefile
index 7c66dcc..e7b37e0 100644
--- a/Makefile
+++ b/Makefile
@@ -1741,7 +1741,7 @@ endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
-	vcs-svn/repo_tree.o vcs-svn/fast_export.o
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1866,7 +1866,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
-	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h
+	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
+	vcs-svn/svndump.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index a3d384c..0a5e3c4 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -4,6 +4,10 @@ All rights reserved.
 Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
 All rights reserved.
 
+Copyright (C) 2005 Stefan Hegny, hydrografix Consulting GmbH,
+Frankfurt/Main, Germany
+and others, see http://svn2cc.sarovar.org
+
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
new file mode 100644
index 0000000..86714ed
--- /dev/null
+++ b/vcs-svn/svndump.c
@@ -0,0 +1,289 @@
+/*
+ * Parse and rearrange a svnadmin dump.
+ * Create the dump with:
+ * svnadmin dump --incremental -r<startrev>:<endrev> <repository> >outfile
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "cache.h"
+#include "repo_tree.h"
+#include "fast_export.h"
+#include "line_buffer.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+#define NODEACT_REPLACE 4
+#define NODEACT_DELETE 3
+#define NODEACT_ADD 2
+#define NODEACT_CHANGE 1
+#define NODEACT_UNKNOWN 0
+
+#define DUMP_CTX 0
+#define REV_CTX  1
+#define NODE_CTX 2
+
+#define LENGTH_UNKNOWN (~0)
+#define DATE_RFC2822_LEN 31
+
+/* Create memory pool for log messages */
+obj_pool_gen(log, char, 4096);
+
+static char* log_copy(uint32_t length, char *log)
+{
+	char *buffer;
+	log_free(log_pool.size);
+	buffer = log_pointer(log_alloc(length));
+	strncpy(buffer, log, length);
+	return buffer;
+}
+
+static struct {
+	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
+	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
+} node_ctx;
+
+static struct {
+	uint32_t revision, author;
+	unsigned long timestamp;
+	char *log;
+} rev_ctx;
+
+static struct {
+	uint32_t uuid, url;
+} dump_ctx;
+
+static struct {
+	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
+		revision_number, node_path, node_kind, node_action,
+		node_copyfrom_path, node_copyfrom_rev, text_content_length,
+		prop_content_length, content_length;
+} keys;
+
+static void reset_node_ctx(char *fname)
+{
+	node_ctx.type = 0;
+	node_ctx.action = NODEACT_UNKNOWN;
+	node_ctx.propLength = LENGTH_UNKNOWN;
+	node_ctx.textLength = LENGTH_UNKNOWN;
+	node_ctx.src[0] = ~0;
+	node_ctx.srcRev = 0;
+	node_ctx.srcMode = 0;
+	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
+	node_ctx.mark = 0;
+}
+
+static void reset_rev_ctx(uint32_t revision)
+{
+	rev_ctx.revision = revision;
+	rev_ctx.timestamp = 0;
+	rev_ctx.log = NULL;
+	rev_ctx.author = ~0;
+}
+
+static void reset_dump_ctx(uint32_t url)
+{
+	dump_ctx.url = url;
+	dump_ctx.uuid = ~0;
+}
+
+static void init_keys(void)
+{
+	keys.svn_log = pool_intern("svn:log");
+	keys.svn_author = pool_intern("svn:author");
+	keys.svn_date = pool_intern("svn:date");
+	keys.svn_executable = pool_intern("svn:executable");
+	keys.svn_special = pool_intern("svn:special");
+	keys.uuid = pool_intern("UUID");
+	keys.revision_number = pool_intern("Revision-number");
+	keys.node_path = pool_intern("Node-path");
+	keys.node_kind = pool_intern("Node-kind");
+	keys.node_action = pool_intern("Node-action");
+	keys.node_copyfrom_path = pool_intern("Node-copyfrom-path");
+	keys.node_copyfrom_rev = pool_intern("Node-copyfrom-rev");
+	keys.text_content_length = pool_intern("Text-content-length");
+	keys.prop_content_length = pool_intern("Prop-content-length");
+	keys.content_length = pool_intern("Content-length");
+}
+
+static void read_props(void)
+{
+	uint32_t len;
+	uint32_t key = ~0;
+	char *val = NULL;
+	char *t;
+	while ((t = buffer_read_line()) && strcmp(t, "PROPS-END")) {
+		if (!strncmp(t, "K ", 2)) {
+			len = atoi(&t[2]);
+			key = pool_intern(buffer_read_string(len));
+			buffer_read_line();
+		} else if (!strncmp(t, "V ", 2)) {
+			len = atoi(&t[2]);
+			val = buffer_read_string(len);
+			if (key == keys.svn_log) {
+				/* Value length excludes terminating nul. */
+				rev_ctx.log = log_copy(len + 1, val);
+			} else if (key == keys.svn_author) {
+				rev_ctx.author = pool_intern(val);
+			} else if (key == keys.svn_date) {
+				if (parse_date_basic(val, &rev_ctx.timestamp, NULL))
+					fprintf(stderr, "Invalid timestamp: %s\n", val);
+			} else if (key == keys.svn_executable) {
+				node_ctx.type = REPO_MODE_EXE;
+			} else if (key == keys.svn_special) {
+				node_ctx.type = REPO_MODE_LNK;
+			}
+			key = ~0;
+			buffer_read_line();
+		}
+	}
+}
+
+static void handle_node(void)
+{
+	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength)
+		read_props();
+
+	if (node_ctx.srcRev)
+		node_ctx.srcMode = repo_copy(node_ctx.srcRev, node_ctx.src, node_ctx.dst);
+
+	if (node_ctx.textLength != LENGTH_UNKNOWN &&
+	    node_ctx.type != REPO_MODE_DIR)
+		node_ctx.mark = next_blob_mark();
+
+	if (node_ctx.action == NODEACT_DELETE) {
+		repo_delete(node_ctx.dst);
+	} else if (node_ctx.action == NODEACT_CHANGE ||
+			   node_ctx.action == NODEACT_REPLACE) {
+		if (node_ctx.action == NODEACT_REPLACE &&
+		    node_ctx.type == REPO_MODE_DIR)
+			repo_replace(node_ctx.dst, node_ctx.mark);
+		else if (node_ctx.propLength != LENGTH_UNKNOWN)
+			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+		else if (node_ctx.textLength != LENGTH_UNKNOWN)
+			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
+	} else if (node_ctx.action == NODEACT_ADD) {
+		if (node_ctx.srcRev && node_ctx.propLength != LENGTH_UNKNOWN)
+			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+		else if (node_ctx.srcRev && node_ctx.textLength != LENGTH_UNKNOWN)
+			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
+		else if ((node_ctx.type == REPO_MODE_DIR && !node_ctx.srcRev) ||
+		         node_ctx.textLength != LENGTH_UNKNOWN)
+			repo_add(node_ctx.dst, node_ctx.type, node_ctx.mark);
+	}
+
+	if (node_ctx.propLength == LENGTH_UNKNOWN && node_ctx.srcMode)
+		node_ctx.type = node_ctx.srcMode;
+
+	if (node_ctx.mark)
+		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength);
+	else if (node_ctx.textLength != LENGTH_UNKNOWN)
+		buffer_skip_bytes(node_ctx.textLength);
+}
+
+static void handle_revision(void)
+{
+	if (rev_ctx.revision)
+		repo_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
+			dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
+}
+
+void svndump_read(char *url)
+{
+	char *val;
+	char *t;
+	uint32_t active_ctx = DUMP_CTX;
+	uint32_t len;
+	uint32_t key;
+
+	reset_dump_ctx(pool_intern(url));
+	while ((t = buffer_read_line())) {
+		val = strstr(t, ": ");
+		if (!val)
+			continue;
+		*val++ = '\0';
+		*val++ = '\0';
+		key = pool_intern(t);
+
+		if (key == keys.uuid) {
+			dump_ctx.uuid = pool_intern(val);
+		} else if (key == keys.revision_number) {
+			if (active_ctx == NODE_CTX)
+				handle_node();
+			if (active_ctx != DUMP_CTX)
+				handle_revision();
+			active_ctx = REV_CTX;
+			reset_rev_ctx(atoi(val));
+		} else if (key == keys.node_path) {
+			if (active_ctx == NODE_CTX)
+				handle_node();
+			active_ctx = NODE_CTX;
+			reset_node_ctx(val);
+		} else if (key == keys.node_kind) {
+			if (!strcmp(val, "dir"))
+				node_ctx.type = REPO_MODE_DIR;
+			else if (!strcmp(val, "file"))
+				node_ctx.type = REPO_MODE_BLB;
+			else
+				fprintf(stderr, "Unknown node-kind: %s\n", val);
+		} else if (key == keys.node_action) {
+			if (!strcmp(val, "delete")) {
+				node_ctx.action = NODEACT_DELETE;
+			} else if (!strcmp(val, "add")) {
+				node_ctx.action = NODEACT_ADD;
+			} else if (!strcmp(val, "change")) {
+				node_ctx.action = NODEACT_CHANGE;
+			} else if (!strcmp(val, "replace")) {
+				node_ctx.action = NODEACT_REPLACE;
+			} else {
+				fprintf(stderr, "Unknown node-action: %s\n", val);
+				node_ctx.action = NODEACT_UNKNOWN;
+			}
+		} else if (key == keys.node_copyfrom_path) {
+			pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.src, "/", val);
+		} else if (key == keys.node_copyfrom_rev) {
+			node_ctx.srcRev = atoi(val);
+		} else if (key == keys.text_content_length) {
+			node_ctx.textLength = atoi(val);
+		} else if (key == keys.prop_content_length) {
+			node_ctx.propLength = atoi(val);
+		} else if (key == keys.content_length) {
+			len = atoi(val);
+			buffer_read_line();
+			if (active_ctx == REV_CTX) {
+				read_props();
+			} else if (active_ctx == NODE_CTX) {
+				handle_node();
+				active_ctx = REV_CTX;
+			} else {
+				fprintf(stderr, "Unexpected content length header: %d\n", len);
+				buffer_skip_bytes(len);
+			}
+		}
+	}
+	if (active_ctx == NODE_CTX)
+		handle_node();
+	if (active_ctx != DUMP_CTX)
+		handle_revision();
+}
+
+void svndump_init(const char *filename)
+{
+	buffer_init(filename);
+	repo_init();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+	init_keys();
+}
+
+void svndump_reset(void)
+{
+	log_reset();
+	buffer_reset();
+	repo_reset();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+}
diff --git a/vcs-svn/svndump.h b/vcs-svn/svndump.h
new file mode 100644
index 0000000..38ad544
--- /dev/null
+++ b/vcs-svn/svndump.h
@@ -0,0 +1,8 @@
+#ifndef SVNDUMP_H_
+#define SVNDUMP_H_
+
+void svndump_init(const char *filename);
+void svndump_read(char *url);
+void svndump_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp
  2010-07-15 16:22 ` [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp Ramkumar Ramachandra
@ 2010-07-15 17:25   ` Jonathan Nieder
  2010-07-15 22:54     ` Junio C Hamano
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier,
	Ramkumar Ramachandra

Ramkumar Ramachandra wrote:

> approxidate() is not appropriate for reading machine-written dates
> because it guesses instead of erroring out on malformed dates.
> parse_date() is less convenient since it returns its output as a
> string.  So export the underlying function that writes a timestamp.
> 
> While at it, change the return value to match the usual convention:
> return 0 for success and -1 for failure.

Junio: I think this should be ejected from the series as an
independently useful cleanup.

Currently parse_date_toffset() is exported but not declared anywhere.
This patch gives it a more predictable API and adds a declaration.

Ram: thanks for the reminder.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] Introduce vcs-svn lib
  2010-07-15 16:22 ` [PATCH 2/8] Introduce vcs-svn lib Ramkumar Ramachandra
@ 2010-07-15 17:46   ` Jonathan Nieder
  2010-07-15 19:15     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 17:46 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

> @@ -1908,6 +1912,8 @@ $(LIB_FILE): $(LIB_OBJS)
>  $(XDIFF_LIB): $(XDIFF_OBJS)
>  	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(XDIFF_OBJS)
>  
> +$(VCSSVN_LIB): $(VCSSVN_OBJS)
> +	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(VCSSVN_OBJS)

 $ make vcs-svn/lib.a V=1
 rm -f vcs-svn/lib.a && ar rcs vcs-svn/lib.a 
 ar: vcs-svn/lib.a: No such file or directory
 make: *** [vcs-svn/lib.a] Error 1

That is because the vcs-svn directory does not exist.  So
probably the LICENSE should be added with the same patch
(and git should learn to track empty directories).

Jonathan

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 3/8] Add memory pool library
  2010-07-15 16:22 ` [PATCH 3/8] Add memory pool library Ramkumar Ramachandra
@ 2010-07-15 18:57   ` Jonathan Nieder
  2010-07-15 19:12     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 18:57 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

> void pre_commit(void);
> 
> 	Write the pool to file.

Except as a proof of concept, this is the wrong API to have.  The problem
is that the caller cannot choose the filename, so it ends up being a .bin
file in the current directory, wherever that is.

The log message leaves out a subtlety: this also increases the
‘committed’ value, and bookkeeping for that might be useful to some
callers.

In other words:

> +static MAYBE_UNUSED void pre##_init(void) \
> +{ \
> +	struct stat st; \
> +	pre##_pool.file = fopen(#pre ".bin", "a+"); \
> +	rewind(pre##_pool.file); \
> +	fstat(fileno(pre##_pool.file), &st); \
> +	pre##_pool.size = st.st_size / sizeof(obj_t); \
> +	pre##_pool.committed = pre##_pool.size; \
> +	pre##_pool.capacity = pre##_pool.size * 2; \
> +	if (pre##_pool.capacity < initial_capacity) \
> +		pre##_pool.capacity = initial_capacity; \
> +	pre##_pool.base = malloc(pre##_pool.capacity * sizeof(obj_t)); \
> +	fread(pre##_pool.base, sizeof(obj_t), pre##_pool.size, pre##_pool.file); \
> +} \

If you just want something working, I’d suggest stubbing this out:

 static MAYBE_UNUSED void pre##_init(void) \
 { \
 } \

It even almost makes sense as API: the _init function does all
initialization tasks required, which is to say, none.  (The {0, ...}
initializer already has taken care of setting all fields to 0).

> +static MAYBE_UNUSED void pre##_commit(void) \
> +{ \
> +	pre##_pool.committed += fwrite(pre##_pool.base + pre##_pool.committed, \
> +		sizeof(obj_t), pre##_pool.size - pre##_pool.committed, \
> +		pre##_pool.file); \
> +} \

This can be simplified

 static MAYBE_UNUSED void pre##_commit(void) \
 { \
	pre##_pool.committed = pre##_pool.size; \
 } \

In other words, maybe something like this on top?  This includes the
vestigal _init() function which really should not be there (it is
confusing that some callers use it and others don’t).  I did not
spend much time on it because in the end I suspect we might throw
obj_pool away anyway.

---
diff --git a/Makefile b/Makefile
index fc31ee0..386a586 100644
--- a/Makefile
+++ b/Makefile
@@ -409,6 +409,7 @@ TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-match-trees
+TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 680d7d6..262f304 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -12,4 +12,10 @@ test_expect_success 'character classes (isspace, isalpha etc.)' '
 	test-ctype
 '
 
+test_expect_success 'allocator for svn importer' '
+	printf "%s\n" 0 0 0 0 0 2 2 2 2 2 -1 >expected &&
+	test-obj-pool >actual &&
+	test_cmp expected actual
+'
+
 test_done
diff --git a/test-obj-pool.c b/test-obj-pool.c
new file mode 100644
index 0000000..1300049
--- /dev/null
+++ b/test-obj-pool.c
@@ -0,0 +1,55 @@
+/*
+ * test-obj-pool.c: code to exercise the svn importer's object pool
+ */
+
+#include "vcs-svn/obj_pool.h"
+
+static const char usage_str[] =
+	"test-obj-pool";
+
+obj_pool_gen(int, int, 2)
+obj_pool_gen(other, int, 4096)
+
+int main(int argc, char *argv[])
+{
+	int *p;
+	int i;
+
+	if (argc != 1) {
+		fprintf(stderr, "Usage: %s\n", usage_str);
+		return 1;
+	}
+
+	int_init();
+
+	p = int_pointer(int_alloc(10));
+	for (i = 0; i < 10; i++)
+		p[i] = 0;
+
+	*other_pointer(other_alloc(1)) = -1;
+	other_commit();
+
+	p = other_pointer(other_alloc(10));
+	for (i = 0; i < 10; i++)
+		p[i] = 1;
+
+	int_free(5);
+
+	p = int_pointer(int_alloc(10));
+	for (i = 0; i < 10; i++)
+		p[i] = 2;
+
+	int_free(5);
+	int_commit();
+
+	for (i = 0; i < int_pool.committed; i++)
+		printf("%d\n", *int_pointer(i));
+
+	for (i = 0; i < other_pool.committed; i++)
+		printf("%d\n", *other_pointer(i));
+
+	int_reset();
+	int_reset();
+	other_reset();
+	return 0;
+}
diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
index f60c872..90eda15 100644
--- a/vcs-svn/obj_pool.h
+++ b/vcs-svn/obj_pool.h
@@ -16,21 +16,9 @@ static struct { \
 	uint32_t size; \
 	uint32_t capacity; \
 	obj_t *base; \
-	FILE *file; \
-} pre##_pool = { 0, 0, 0, NULL, NULL}; \
+} pre##_pool = {0, 0, 0, NULL}; \
 static MAYBE_UNUSED void pre##_init(void) \
 { \
-	struct stat st; \
-	pre##_pool.file = fopen(#pre ".bin", "a+"); \
-	rewind(pre##_pool.file); \
-	fstat(fileno(pre##_pool.file), &st); \
-	pre##_pool.size = st.st_size / sizeof(obj_t); \
-	pre##_pool.committed = pre##_pool.size; \
-	pre##_pool.capacity = pre##_pool.size * 2; \
-	if (pre##_pool.capacity < initial_capacity) \
-		pre##_pool.capacity = initial_capacity; \
-	pre##_pool.base = malloc(pre##_pool.capacity * sizeof(obj_t)); \
-	fread(pre##_pool.base, sizeof(obj_t), pre##_pool.size, pre##_pool.file); \
 } \
 static MAYBE_UNUSED uint32_t pre##_alloc(uint32_t count) \
 { \
@@ -62,19 +50,15 @@ static MAYBE_UNUSED obj_t *pre##_pointer(uint32_t offset) \
 } \
 static MAYBE_UNUSED void pre##_commit(void) \
 { \
-	pre##_pool.committed += fwrite(pre##_pool.base + pre##_pool.committed, \
-		sizeof(obj_t), pre##_pool.size - pre##_pool.committed, \
-		pre##_pool.file); \
+	pre##_pool.committed = pre##_pool.size; \
 } \
 static MAYBE_UNUSED void pre##_reset(void) \
 { \
 	free(pre##_pool.base); \
-	if (pre##_pool.file) \
-		fclose(pre##_pool.file); \
 	pre##_pool.base = NULL; \
 	pre##_pool.size = 0; \
 	pre##_pool.capacity = 0; \
-	pre##_pool.file = NULL; \
+	pre##_pool.committed = 0; \
 }
 
 #endif
-- 

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 4/8] Add treap implementation
  2010-07-15 16:23 ` [PATCH 4/8] Add treap implementation Ramkumar Ramachandra
@ 2010-07-15 19:09   ` Jonathan Nieder
  2010-07-15 19:18     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 19:09 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

> Treaps provide a memory-efficient binary search tree structure.
> Insertion/deletion/search are about as about as fast in the average
> case as red-black trees and the chances of worst-case behavior are
> vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
> behavior is a small price to pay, given that treaps are much simpler
> to implement.

I still haven’t checked this implementation in detail, but it seemed
to work in practice and is about to change anyway.

I like the documentation updates.  What else changed from the
previous round?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 3/8] Add memory pool library
  2010-07-15 18:57   ` Jonathan Nieder
@ 2010-07-15 19:12     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 19:12 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Hi Jonathan,

Jonathan Nieder writes:
> Ramkumar Ramachandra wrote:
> 
> > void pre_commit(void);
> > 
> > 	Write the pool to file.
> 
> Except as a proof of concept, this is the wrong API to have.  The problem
> is that the caller cannot choose the filename, so it ends up being a .bin
> file in the current directory, wherever that is.
> 
> The log message leaves out a subtlety: this also increases the
> ‘committed’ value, and bookkeeping for that might be useful to some
> callers.
> 
> In other words:
> 
> > +static MAYBE_UNUSED void pre##_init(void) \
> > +{ \
> > +	struct stat st; \
> > +	pre##_pool.file = fopen(#pre ".bin", "a+"); \
> > +	rewind(pre##_pool.file); \
> > +	fstat(fileno(pre##_pool.file), &st); \
> > +	pre##_pool.size = st.st_size / sizeof(obj_t); \
> > +	pre##_pool.committed = pre##_pool.size; \
> > +	pre##_pool.capacity = pre##_pool.size * 2; \
> > +	if (pre##_pool.capacity < initial_capacity) \
> > +		pre##_pool.capacity = initial_capacity; \
> > +	pre##_pool.base = malloc(pre##_pool.capacity * sizeof(obj_t)); \
> > +	fread(pre##_pool.base, sizeof(obj_t), pre##_pool.size, pre##_pool.file); \
> > +} \
> 
> If you just want something working, I’d suggest stubbing this out:
> 
>  static MAYBE_UNUSED void pre##_init(void) \
>  { \
>  } \
> 
> It even almost makes sense as API: the _init function does all
> initialization tasks required, which is to say, none.  (The {0, ...}
> initializer already has taken care of setting all fields to 0).
> 
> > +static MAYBE_UNUSED void pre##_commit(void) \
> > +{ \
> > +	pre##_pool.committed += fwrite(pre##_pool.base + pre##_pool.committed, \
> > +		sizeof(obj_t), pre##_pool.size - pre##_pool.committed, \
> > +		pre##_pool.file); \
> > +} \
> 
> This can be simplified
> 
>  static MAYBE_UNUSED void pre##_commit(void) \
>  { \
> 	pre##_pool.committed = pre##_pool.size; \
>  } \
> 
> In other words, maybe something like this on top?  This includes the
> vestigal _init() function which really should not be there (it is
> confusing that some callers use it and others don’t).  I did not
> spend much time on it because in the end I suspect we might throw
> obj_pool away anyway.

Oh, right. I remember that you asked to turn off persistence for this
merge. We can include persistence it in a later series.

Junio: Could you squash this diff into the commit?

diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
index f60c872..7a256d4 100644
--- a/vcs-svn/obj_pool.h
+++ b/vcs-svn/obj_pool.h
@@ -20,17 +20,6 @@ static struct { \
 } pre##_pool = { 0, 0, 0, NULL, NULL}; \
 static MAYBE_UNUSED void pre##_init(void) \
 { \
-	struct stat st; \
-	pre##_pool.file = fopen(#pre ".bin", "a+"); \
-	rewind(pre##_pool.file); \
-	fstat(fileno(pre##_pool.file), &st); \
-	pre##_pool.size = st.st_size / sizeof(obj_t); \
-	pre##_pool.committed = pre##_pool.size; \
-	pre##_pool.capacity = pre##_pool.size * 2; \
-	if (pre##_pool.capacity < initial_capacity) \
-		pre##_pool.capacity = initial_capacity; \
-	pre##_pool.base = malloc(pre##_pool.capacity * sizeof(obj_t)); \
-	fread(pre##_pool.base, sizeof(obj_t), pre##_pool.size, pre##_pool.file); \
 } \
 static MAYBE_UNUSED uint32_t pre##_alloc(uint32_t count) \
 { \
@@ -62,9 +51,7 @@ static MAYBE_UNUSED obj_t *pre##_pointer(uint32_t offset) \
 } \
 static MAYBE_UNUSED void pre##_commit(void) \
 { \
-	pre##_pool.committed += fwrite(pre##_pool.base + pre##_pool.committed, \
-		sizeof(obj_t), pre##_pool.size - pre##_pool.committed, \
-		pre##_pool.file); \
+	pre##_pool.committed = pre##_pool.size; \
 } \
 static MAYBE_UNUSED void pre##_reset(void) \
 { \

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] Introduce vcs-svn lib
  2010-07-15 17:46   ` Jonathan Nieder
@ 2010-07-15 19:15     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 19:15 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Hi Jonathan,

Jonathan Nieder writes:
> Ramkumar Ramachandra wrote:
> 
> > @@ -1908,6 +1912,8 @@ $(LIB_FILE): $(LIB_OBJS)
> >  $(XDIFF_LIB): $(XDIFF_OBJS)
> >  	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(XDIFF_OBJS)
> >  
> > +$(VCSSVN_LIB): $(VCSSVN_OBJS)
> > +	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(VCSSVN_OBJS)
> 
>  $ make vcs-svn/lib.a V=1
>  rm -f vcs-svn/lib.a && ar rcs vcs-svn/lib.a 
>  ar: vcs-svn/lib.a: No such file or directory
>  make: *** [vcs-svn/lib.a] Error 1
> 
> That is because the vcs-svn directory does not exist.  So
> probably the LICENSE should be added with the same patch
> (and git should learn to track empty directories).

Oops. Sorry about not checking this: it looked alright at a
glance. Yes, we can add LICENSE with this patch.

Junio: Could you squash in this diff?

--- /dev/null
+++ b/vcs-svn/LICENSE
@@ -0,0 +1,26 @@
+Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice(s), this list of conditions and the following disclaimer
+   unmodified other than the allowable addition of one or more
+   copyright notices.
+2. Redistributions in binary form must reproduce the above copyright
+   notice(s), this list of conditions and the following disclaimer in
+   the documentation and/or other materials provided with the
+   distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-- Ram

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 4/8] Add treap implementation
  2010-07-15 19:09   ` Jonathan Nieder
@ 2010-07-15 19:18     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-07-15 19:18 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Hi again,

Jonathan Nieder writes:
> I like the documentation updates.  What else changed from the
> previous round?

Nothing else. I should have posted a diff from last time.

-- Ram

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 6/8] Add stream helper library
  2010-07-15 16:23 ` [PATCH 6/8] Add stream helper library Ramkumar Ramachandra
@ 2010-07-15 19:19   ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 19:19 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

> From: David Barr <david.barr@cordelta.com>
> 
> This library provides thread-unsafe fgets()- and fread()-like
> functions where the caller does not have to supply a buffer.  It
> maintains a couple of static buffers and provides an API to use
> them.
> 
> NEEDSWORK: what should buffer_copy_bytes do on error?

For consistency with the rest of vcs-svn, it should do nothing. :)

I would love to see svn-fe diagnosing and recovering somehow from
faulty input.  For now it follows the easier route of just ignoring
(and skipping) confusing input.

Probably this should be mentioned in the man page somewhere.

[...]
> +void buffer_copy_bytes(uint32_t len)
> +{
[...]
> +		if (ferror(infile) || ferror(stdout))
> +			/* NEEDSWORK: handle error. */

The next input/output operation will fail, causing svn-fe to quit
early, so it would not be easy for such an error to go unnoticed.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 7/8] Add infrastructure to write revisions in fast-export format
  2010-07-15 16:23 ` [PATCH 7/8] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
@ 2010-07-15 19:28   ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 19:28 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

> diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
> new file mode 100644
> index 0000000..7552803
> --- /dev/null
> +++ b/vcs-svn/fast_export.c
> @@ -0,0 +1,75 @@
[...]
> +void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
> +                        uint32_t mark)
> +{
> +	/* Mode must be 100644, 100755, 120000, or 160000. */
> +	printf("M %06o :%d ", mode, mark);

David tweaked the API nicely upstream to enforce this constraint.  So
nice things will come with the next pull from him.

> diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
> new file mode 100644
> index 0000000..47e8f56
> --- /dev/null
> +++ b/vcs-svn/fast_export.h
> @@ -0,0 +1,14 @@
> +#ifndef FAST_EXPORT_H_
> +#define FAST_EXPORT_H_
> +
> +#include <stdint.h>
> +#include <time.h>

The usual convention within git is to rely on .c files to include
git-compat-util.h (indirectly through cache.h or directly).

> +/* Memory pools for commit, dir and dirent */
> +obj_pool_gen(commit, struct repo_commit, 4096);
> +obj_pool_gen(dir, struct repo_dir, 4096);
> +obj_pool_gen(dirent, struct repo_dirent, 4096);

Are the semicolons necessary?  (A nitpick, I know).

That said, this part is looking pretty good.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 8/8] Add SVN dump parser
  2010-07-15 16:23 ` [PATCH 8/8] Add SVN dump parser Ramkumar Ramachandra
@ 2010-07-15 19:52   ` Jonathan Nieder
  2010-07-15 20:04     ` Jonathan Nieder
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 19:52 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

> svndump parses data that is in SVN dumpfile format produced by
> `svnadmin dump` with the help of line_buffer and uses repo_tree and
> fast_export to emit a git fast-import stream.

Probably worth mentioning the this requires a dumpfile v2 (i.e., it
does not understand the svndiff0 delta format yet).

> +/* Create memory pool for log messages */
> +obj_pool_gen(log, char, 4096);
> +
> +static char* log_copy(uint32_t length, char *log)
> +{
> +	char *buffer;
> +	log_free(log_pool.size);
> +	buffer = log_pointer(log_alloc(length));
> +	strncpy(buffer, log, length);
> +	return buffer;
> +}

A strbuf would do just as well.  But using obj_pool looks like a fine
way to avoid depending on another piece of git code.

> +static struct {
> +	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
> +		revision_number, node_path, node_kind, node_action,
> +		node_copyfrom_path, node_copyfrom_rev, text_content_length,
> +		prop_content_length, content_length;
> +} keys;

Neat.  This is a textbook example of where to use a perfect hash, but
comparing interned strings is simpler and fast enough (and the
bottlenecks are elsewhere).

> +static void read_props(void)
[...]
> +			if (key == keys.svn_log) {
> +				/* Value length excludes terminating nul. */
> +				rev_ctx.log = log_copy(len + 1, val);
> +			} else if (key == keys.svn_author) {
> +				rev_ctx.author = pool_intern(val);
> +			} else if (key == keys.svn_date) {
> +				if (parse_date_basic(val, &rev_ctx.timestamp, NULL))
> +					fprintf(stderr, "Invalid timestamp: %s\n", val);
> +			} else if (key == keys.svn_executable) {
> +				node_ctx.type = REPO_MODE_EXE;
> +			} else if (key == keys.svn_special) {
> +				node_ctx.type = REPO_MODE_LNK;
> +			}
> +			key = ~0;

Unknown properties are ignored.  Adding stream comments to allow
recovering them is left as an exercise for the interested reader.

> +static void handle_node(void)
> +{

A simple reader does not cope well with this kind of function.  It is
hard to know if it exhaustively deals with all cases.

But: with real-world repos (e.g. ASF) it works well enough.

> +void svndump_read(char *url)
> +{

Too long.  I realize that writing a state machine can be hard in C;
maybe it would be easiest to package up the state in a struct and
have a separate function for the main loop body.

The patches I didn’t comment on all look good.  I don’t think anything I did
comment on should prevent this reaching a wider audience.

Thanks for the pleasant read,
Jonathan

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 8/8] Add SVN dump parser
  2010-07-15 19:52   ` Jonathan Nieder
@ 2010-07-15 20:04     ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-15 20:04 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Jonathan Nieder wrote:

> I don’t think anything I did
> comment on should prevent this reaching a wider audience.

Except for the obj_pool persistence bit, but you already commented on
that.

Happy travels,
Jonathan

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp
  2010-07-15 17:25   ` Jonathan Nieder
@ 2010-07-15 22:54     ` Junio C Hamano
  0 siblings, 0 replies; 79+ messages in thread
From: Junio C Hamano @ 2010-07-15 22:54 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier,
	Ramkumar Ramachandra

Jonathan Nieder <jrnieder@gmail.com> writes:

> Junio: I think this should be ejected from the series as an
> independently useful cleanup.
>
> Currently parse_date_toffset() is exported but not declared anywhere.
> This patch gives it a more predictable API and adds a declaration.

Yeah, that makes sense.  What this patch does seems to be what c5043cc
(Refactor parse_date for approxidate functions, 2010-06-03) should have
done from the beginning.

Thanks.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] Resurrect rr/svn-export
  2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
                   ` (7 preceding siblings ...)
  2010-07-15 16:23 ` [PATCH 8/8] Add SVN dump parser Ramkumar Ramachandra
@ 2010-07-16 10:13 ` Jonathan Nieder
  2010-07-16 10:16   ` [PATCH 3/9] Add memory pool library Jonathan Nieder
                     ` (2 more replies)
  8 siblings, 3 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-16 10:13 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Hi,

While Ram travels halfway across the globe, I have rerolled at his
request.  You can find the result at
  git://repo.or.cz/git/jrn.git rr/svn-fe

The main difference from the version sent already is the addition of
tests and a small tweak to the treap implementation to make them
pass.  I will send the two patches with new tests as replies to this
message.  Thoughts welcome, as always.

David Barr (5):
  Add memory pool library
  vcs-svn: Add string-specific memory pool
  Add stream helper library
  Add infrastructure to write revisions in fast-export format
  Add SVN dump parser

Jason Evans (1):
  Add treap implementation

Jonathan Nieder (3):
  Export parse_date_basic() to convert a date string to timestamp
  Introduce vcs-svn lib
  Add a sample user for the svndump library

 .gitignore                |    2 +
 Makefile                  |   14 ++-
 cache.h                   |    1 +
 contrib/svn-fe/.gitignore |    4 +
 contrib/svn-fe/Makefile   |   64 +++++++++
 contrib/svn-fe/svn-fe.c   |   15 ++
 contrib/svn-fe/svn-fe.txt |   68 +++++++++
 date.c                    |   14 +-
 t/t0080-vcs-svn.sh        |  101 ++++++++++++++
 test-obj-pool.c           |  116 ++++++++++++++++
 test-treap.c              |   65 +++++++++
 vcs-svn/LICENSE           |   33 +++++
 vcs-svn/fast_export.c     |   75 ++++++++++
 vcs-svn/fast_export.h     |   14 ++
 vcs-svn/line_buffer.c     |   91 +++++++++++++
 vcs-svn/line_buffer.h     |   12 ++
 vcs-svn/obj_pool.h        |   61 +++++++++
 vcs-svn/repo_tree.c       |  331 +++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h       |   26 ++++
 vcs-svn/string_pool.c     |  101 ++++++++++++++
 vcs-svn/string_pool.h     |   12 ++
 vcs-svn/svndump.c         |  289 +++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h         |    8 +
 vcs-svn/trp.h             |  223 ++++++++++++++++++++++++++++++
 vcs-svn/trp.txt           |   98 +++++++++++++
 25 files changed, 1829 insertions(+), 9 deletions(-)
 create mode 100644 contrib/svn-fe/.gitignore
 create mode 100644 contrib/svn-fe/Makefile
 create mode 100644 contrib/svn-fe/svn-fe.c
 create mode 100644 contrib/svn-fe/svn-fe.txt
 create mode 100755 t/t0080-vcs-svn.sh
 create mode 100644 test-obj-pool.c
 create mode 100644 test-treap.c
 create mode 100644 vcs-svn/LICENSE
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create mode 100644 vcs-svn/obj_pool.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

-- 
1.7.2.rc2

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 3/9] Add memory pool library
  2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
@ 2010-07-16 10:16   ` Jonathan Nieder
  2010-07-16 10:23   ` [PATCH 4/9] Add treap implementation Jonathan Nieder
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
  2 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-16 10:16 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

Add a memory pool library implemented using C macros. The
obj_pool_gen() macro creates a type-specific memory pool.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This could allow the memory pool to persist between runs of an
application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Stripped out pool_init.  And added tests!  There is not really
much an allocator can do, so it is fun to play around with.

 .gitignore         |    1 +
 Makefile           |    4 +-
 t/t0080-vcs-svn.sh |   79 +++++++++++++++++++++++++++++++++++
 test-obj-pool.c    |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/obj_pool.h |   61 +++++++++++++++++++++++++++
 5 files changed, 260 insertions(+), 1 deletions(-)
 create mode 100755 t/t0080-vcs-svn.sh
 create mode 100644 test-obj-pool.c
 create mode 100644 vcs-svn/obj_pool.h

diff --git a/.gitignore b/.gitignore
index 14e2b6b..1e64a6a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -167,6 +167,7 @@
 /test-genrandom
 /test-index-version
 /test-match-trees
+/test-obj-pool
 /test-parse-options
 /test-path-utils
 /test-run-command
diff --git a/Makefile b/Makefile
index d6a779b..3b873cd 100644
--- a/Makefile
+++ b/Makefile
@@ -409,6 +409,7 @@ TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-match-trees
+TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
@@ -1863,7 +1864,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
-$(VCSSVN_OBJS):
+$(VCSSVN_OBJS): \
+	vcs-svn/obj_pool.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/t/t0080-vcs-svn.sh b/t/t0080-vcs-svn.sh
new file mode 100755
index 0000000..3f29496
--- /dev/null
+++ b/t/t0080-vcs-svn.sh
@@ -0,0 +1,79 @@
+#!/bin/sh
+
+test_description='check infrastructure for svn importer'
+
+. ./test-lib.sh
+uint32_max=4294967295
+
+test_expect_success 'obj pool: store data' '
+	cat <<-\EOF >expected &&
+	0
+	1
+	EOF
+
+	test-obj-pool <<-\EOF >actual &&
+	alloc one 16
+	set one 13
+	test one 13
+	reset one
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'obj pool: NULL is offset ~0' '
+	echo "$uint32_max" >expected &&
+	echo null one | test-obj-pool >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'obj pool: out-of-bounds access' '
+	cat <<-EOF >expected &&
+	0
+	0
+	$uint32_max
+	$uint32_max
+	16
+	20
+	$uint32_max
+	EOF
+
+	test-obj-pool <<-\EOF >actual &&
+	alloc one 16
+	alloc two 16
+	offset one 20
+	offset two 20
+	alloc one 5
+	offset one 20
+	free one 1
+	offset one 20
+	reset one
+	reset two
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'obj pool: high-water mark' '
+	cat <<-\EOF >expected &&
+	0
+	0
+	10
+	20
+	20
+	20
+	EOF
+
+	test-obj-pool <<-\EOF >actual &&
+	alloc one 10
+	committed one
+	alloc one 10
+	commit one
+	committed one
+	alloc one 10
+	free one 20
+	committed one
+	reset one
+	EOF
+	test_cmp expected actual
+'
+
+test_done
diff --git a/test-obj-pool.c b/test-obj-pool.c
new file mode 100644
index 0000000..5018863
--- /dev/null
+++ b/test-obj-pool.c
@@ -0,0 +1,116 @@
+/*
+ * test-obj-pool.c: code to exercise the svn importer's object pool
+ */
+
+#include "cache.h"
+#include "vcs-svn/obj_pool.h"
+
+enum pool { POOL_ONE, POOL_TWO };
+obj_pool_gen(one, int, 1)
+obj_pool_gen(two, int, 4096)
+
+static uint32_t strtouint32(const char *s)
+{
+	char *end;
+	uintmax_t n = strtoumax(s, &end, 10);
+	if (*s == '\0' || (*end != '\n' && *end != '\0'))
+		die("invalid offset: %s", s);
+	return (uint32_t) n;
+}
+
+static void handle_command(const char *command, enum pool pool, const char *arg)
+{
+	switch (*command) {
+	case 'a':
+		if (!prefixcmp(command, "alloc ")) {
+			uint32_t n = strtouint32(arg);
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_alloc(n) : two_alloc(n));
+			return;
+		}
+	case 'c':
+		if (!prefixcmp(command, "commit ")) {
+			pool == POOL_ONE ? one_commit() : two_commit();
+			return;
+		}
+		if (!prefixcmp(command, "committed ")) {
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_pool.committed : two_pool.committed);
+			return;
+		}
+	case 'f':
+		if (!prefixcmp(command, "free ")) {
+			uint32_t n = strtouint32(arg);
+			pool == POOL_ONE ? one_free(n) : two_free(n);
+			return;
+		}
+	case 'n':
+		if (!prefixcmp(command, "null ")) {
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_offset(NULL) : two_offset(NULL));
+			return;
+		}
+	case 'o':
+		if (!prefixcmp(command, "offset ")) {
+			uint32_t n = strtouint32(arg);
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_offset(one_pointer(n)) :
+				two_offset(two_pointer(n)));
+			return;
+		}
+	case 'r':
+		if (!prefixcmp(command, "reset ")) {
+			pool == POOL_ONE ? one_reset() : two_reset();
+			return;
+		}
+	case 's':
+		if (!prefixcmp(command, "set ")) {
+			uint32_t n = strtouint32(arg);
+			if (pool == POOL_ONE)
+				*one_pointer(n) = 1;
+			else
+				*two_pointer(n) = 1;
+			return;
+		}
+	case 't':
+		if (!prefixcmp(command, "test ")) {
+			uint32_t n = strtouint32(arg);
+			printf("%d\n", pool == POOL_ONE ?
+				*one_pointer(n) : *two_pointer(n));
+			return;
+		}
+	default:
+		die("unrecognized command: %s", command);
+	}
+}
+
+static void handle_line(const char *line)
+{
+	const char *arg = strchr(line, ' ');
+	enum pool pool;
+
+	if (arg && !prefixcmp(arg + 1, "one"))
+		pool = POOL_ONE;
+	else if (arg && !prefixcmp(arg + 1, "two"))
+		pool = POOL_TWO;
+	else
+		die("no pool specified: %s", line);
+
+	handle_command(line, pool, arg + strlen("one "));
+}
+
+int main(int argc, char *argv[])
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (argc != 1)
+		usage("test-obj-str < script");
+
+	while (strbuf_getline(&sb, stdin, '\n') != EOF)
+		handle_line(sb.buf);
+	strbuf_release(&sb);
+	return 0;
+}
diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
new file mode 100644
index 0000000..deb6eb8
--- /dev/null
+++ b/vcs-svn/obj_pool.h
@@ -0,0 +1,61 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#ifndef OBJ_POOL_H_
+#define OBJ_POOL_H_
+
+#include "git-compat-util.h"
+
+#define MAYBE_UNUSED __attribute__((__unused__))
+
+#define obj_pool_gen(pre, obj_t, initial_capacity) \
+static struct { \
+	uint32_t committed; \
+	uint32_t size; \
+	uint32_t capacity; \
+	obj_t *base; \
+} pre##_pool = {0, 0, 0, NULL}; \
+static MAYBE_UNUSED uint32_t pre##_alloc(uint32_t count) \
+{ \
+	uint32_t offset; \
+	if (pre##_pool.size + count > pre##_pool.capacity) { \
+		while (pre##_pool.size + count > pre##_pool.capacity) \
+			if (pre##_pool.capacity) \
+				pre##_pool.capacity *= 2; \
+			else \
+				pre##_pool.capacity = initial_capacity; \
+		pre##_pool.base = realloc(pre##_pool.base, \
+					pre##_pool.capacity * sizeof(obj_t)); \
+	} \
+	offset = pre##_pool.size; \
+	pre##_pool.size += count; \
+	return offset; \
+} \
+static MAYBE_UNUSED void pre##_free(uint32_t count) \
+{ \
+	pre##_pool.size -= count; \
+} \
+static MAYBE_UNUSED uint32_t pre##_offset(obj_t *obj) \
+{ \
+	return obj == NULL ? ~0 : obj - pre##_pool.base; \
+} \
+static MAYBE_UNUSED obj_t *pre##_pointer(uint32_t offset) \
+{ \
+	return offset >= pre##_pool.size ? NULL : &pre##_pool.base[offset]; \
+} \
+static MAYBE_UNUSED void pre##_commit(void) \
+{ \
+	pre##_pool.committed = pre##_pool.size; \
+} \
+static MAYBE_UNUSED void pre##_reset(void) \
+{ \
+	free(pre##_pool.base); \
+	pre##_pool.base = NULL; \
+	pre##_pool.size = 0; \
+	pre##_pool.capacity = 0; \
+	pre##_pool.committed = 0; \
+}
+
+#endif
-- 
1.7.2.rc2

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 4/9] Add treap implementation
  2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
  2010-07-16 10:16   ` [PATCH 3/9] Add memory pool library Jonathan Nieder
@ 2010-07-16 10:23   ` Jonathan Nieder
  2010-07-16 18:26     ` Jonathan Nieder
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
  2 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-16 10:23 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: Jason Evans <jasone@canonware.com>

Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Tweaked treap_search() to always return the node after a missing
node, like it is documented to.  For vcs-svn this doesn’t matter
but the predictable semantics should make debugging easier.

The rest of the patches are almost identical to the versions Ram
sent; see the aforementioned git tree if you are interested in
trying them out.  Testing would be quite welcome.

 .gitignore         |    1 +
 Makefile           |    3 +-
 t/t0080-vcs-svn.sh |   22 +++++
 test-treap.c       |   65 +++++++++++++++
 vcs-svn/LICENSE    |    3 +
 vcs-svn/trp.h      |  223 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt    |   98 +++++++++++++++++++++++
 7 files changed, 414 insertions(+), 1 deletions(-)
 create mode 100644 test-treap.c
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

diff --git a/.gitignore b/.gitignore
index 1e64a6a..af47653 100644
--- a/.gitignore
+++ b/.gitignore
@@ -173,6 +173,7 @@
 /test-run-command
 /test-sha1
 /test-sigchain
+/test-treap
 /common-cmds.h
 *.tar.gz
 *.dsc
diff --git a/Makefile b/Makefile
index 3b873cd..71d77c4 100644
--- a/Makefile
+++ b/Makefile
@@ -415,6 +415,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-treap
 TEST_PROGRAMS_NEED_X += test-index-version
 
 TEST_PROGRAMS = $(patsubst %,%$X,$(TEST_PROGRAMS_NEED_X))
@@ -1865,7 +1866,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/t/t0080-vcs-svn.sh b/t/t0080-vcs-svn.sh
index 3f29496..dd5bab8 100755
--- a/t/t0080-vcs-svn.sh
+++ b/t/t0080-vcs-svn.sh
@@ -76,4 +76,26 @@ test_expect_success 'obj pool: high-water mark' '
 	test_cmp expected actual
 '
 
+test_expect_success 'treap sort' '
+	cat <<-\EOF >unsorted &&
+	58
+	2
+	3
+	3
+	58
+	3
+	3
+	11
+	0
+	1
+	2
+	3
+	3
+	EOF
+	sort -n unsorted >expected &&
+
+	test-treap <unsorted >actual &&
+	test_cmp expected actual
+'
+
 test_done
diff --git a/test-treap.c b/test-treap.c
new file mode 100644
index 0000000..eae7324
--- /dev/null
+++ b/test-treap.c
@@ -0,0 +1,65 @@
+/*
+ * test-treap.c: code to exercise the svn importer's treap structure
+ */
+
+#include "cache.h"
+#include "vcs-svn/obj_pool.h"
+#include "vcs-svn/trp.h"
+
+struct int_node {
+	uintmax_t n;
+	struct trp_node children;
+};
+
+obj_pool_gen(node, struct int_node, 3)
+
+static int node_cmp(struct int_node *a, struct int_node *b)
+{
+	return (a->n > b->n) - (a->n < b->n);
+}
+
+trp_gen(static, treap_, struct int_node, children, node, node_cmp)
+
+static void strtonode(struct int_node *item, const char *s)
+{
+	char *end;
+	item->n = strtoumax(s, &end, 10);
+	if (*s == '\0' || (*end != '\n' && *end != '\0'))
+		die("invalid integer: %s", s);
+}
+
+int main(int argc, char *argv[])
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct trp_root root = { ~0 };
+	uint32_t item;
+
+	if (argc != 1)
+		usage("test-treap < ints");
+
+	while (strbuf_getline(&sb, stdin, '\n') != EOF) {
+		item = node_alloc(1);
+		strtonode(node_pointer(item), sb.buf);
+		treap_insert(&root, node_pointer(item));
+	}
+
+	item = node_offset(treap_first(&root));
+	while (~item) {
+		uint32_t next;
+		struct int_node *tmp = node_pointer(node_alloc(1));
+
+		tmp->n = node_pointer(item)->n;
+		next = node_offset(treap_next(&root, node_pointer(item)));
+
+		treap_remove(&root, node_pointer(item));
+		item = node_offset(treap_search(&root, tmp));
+
+		if (item != next && (!~item || node_pointer(item)->n != tmp->n))
+			die("found %"PRIuMAX" in place of %"PRIuMAX"",
+				~item ? node_pointer(item)->n : ~(uintmax_t) 0,
+				~next ? node_pointer(next)->n : ~(uintmax_t) 0);
+		printf("%"PRIuMAX"\n", tmp->n);
+	}
+	node_reset();
+	return 0;
+}
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index 6e52372..a3d384c 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -1,6 +1,9 @@
 Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
 All rights reserved.
 
+Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
+All rights reserved.
+
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
new file mode 100644
index 0000000..49940cf
--- /dev/null
+++ b/vcs-svn/trp.h
@@ -0,0 +1,223 @@
+/*
+ * C macro implementation of treaps.
+ *
+ * Usage:
+ *   #include <stdint.h>
+ *   #include "trp.h"
+ *   trp_gen(...)
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#ifndef TRP_H_
+#define TRP_H_
+
+#define MAYBE_UNUSED __attribute__((__unused__))
+
+/* Node structure. */
+struct trp_node {
+	uint32_t trpn_left;
+	uint32_t trpn_right;
+};
+
+/* Root structure. */
+struct trp_root {
+	uint32_t trp_root;
+};
+
+/* Pointer/Offset conversion. */
+#define trpn_pointer(a_base, a_offset) (a_base##_pointer(a_offset))
+#define trpn_offset(a_base, a_pointer) (a_base##_offset(a_pointer))
+#define trpn_modify(a_base, a_offset) \
+	do { \
+		if ((a_offset) < a_base##_pool.committed) { \
+			uint32_t old_offset = (a_offset);\
+			(a_offset) = a_base##_alloc(1); \
+			*trpn_pointer(a_base, a_offset) = \
+				*trpn_pointer(a_base, old_offset); \
+		} \
+	} while (0);
+
+/* Left accessors. */
+#define trp_left_get(a_base, a_field, a_node) \
+	(trpn_pointer(a_base, a_node)->a_field.trpn_left)
+#define trp_left_set(a_base, a_field, a_node, a_left) \
+	do { \
+		trpn_modify(a_base, a_node); \
+		trp_left_get(a_base, a_field, a_node) = (a_left); \
+	} while(0)
+
+/* Right accessors. */
+#define trp_right_get(a_base, a_field, a_node) \
+	(trpn_pointer(a_base, a_node)->a_field.trpn_right)
+#define trp_right_set(a_base, a_field, a_node, a_right) \
+	do { \
+		trpn_modify(a_base, a_node); \
+		trp_right_get(a_base, a_field, a_node) = (a_right); \
+	} while(0)
+
+/*
+ * Fibonacci hash function.
+ * The multiplier is the nearest prime to (2^32 times (√5 - 1)/2).
+ * See Knuth §6.4: volume 3, 3rd ed, p518.
+ */
+#define trpn_hash(a_node) (uint32_t) (2654435761u * (a_node))
+
+/* Priority accessors. */
+#define trp_prio_get(a_node) trpn_hash(a_node)
+
+/* Node initializer. */
+#define trp_node_new(a_base, a_field, a_node) \
+	do { \
+		trp_left_set(a_base, a_field, (a_node), ~0); \
+		trp_right_set(a_base, a_field, (a_node), ~0); \
+	} while(0)
+
+/* Internal utility macros. */
+#define trpn_first(a_base, a_field, a_root, r_node) \
+	do { \
+		(r_node) = (a_root); \
+		if ((r_node) == ~0) \
+			return NULL; \
+		while (~trp_left_get(a_base, a_field, (r_node))) \
+			(r_node) = trp_left_get(a_base, a_field, (r_node)); \
+	} while (0)
+
+#define trpn_rotate_left(a_base, a_field, a_node, r_node) \
+	do { \
+		(r_node) = trp_right_get(a_base, a_field, (a_node)); \
+		trp_right_set(a_base, a_field, (a_node), \
+			trp_left_get(a_base, a_field, (r_node))); \
+		trp_left_set(a_base, a_field, (r_node), (a_node)); \
+	} while(0)
+
+#define trpn_rotate_right(a_base, a_field, a_node, r_node) \
+	do { \
+		(r_node) = trp_left_get(a_base, a_field, (a_node)); \
+		trp_left_set(a_base, a_field, (a_node), \
+			trp_right_get(a_base, a_field, (r_node))); \
+		trp_right_set(a_base, a_field, (r_node), (a_node)); \
+	} while(0)
+
+#define trp_gen(a_attr, a_pre, a_type, a_field, a_base, a_cmp) \
+a_attr a_type MAYBE_UNUSED *a_pre##first(struct trp_root *treap) \
+{ \
+	uint32_t ret; \
+	trpn_first(a_base, a_field, treap->trp_root, ret); \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##next(struct trp_root *treap, a_type *node) \
+{ \
+	uint32_t ret; \
+	uint32_t offset = trpn_offset(a_base, node); \
+	if (~trp_right_get(a_base, a_field, offset)) { \
+		trpn_first(a_base, a_field, \
+			trp_right_get(a_base, a_field, offset), ret); \
+	} else { \
+		uint32_t tnode = treap->trp_root; \
+		ret = ~0; \
+		while (1) { \
+			int cmp = (a_cmp)(trpn_pointer(a_base, offset), \
+				trpn_pointer(a_base, tnode)); \
+			if (cmp < 0) { \
+				ret = tnode; \
+				tnode = trp_left_get(a_base, a_field, tnode); \
+			} else if (cmp > 0) { \
+				tnode = trp_right_get(a_base, a_field, tnode); \
+			} else { \
+				break; \
+			} \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
+{ \
+	int cmp; \
+	uint32_t ret = treap->trp_root; \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+		if (cmp < 0) { \
+			if (!~trp_left_get(a_base, a_field, ret)) \
+				break; \
+			ret = trp_left_get(a_base, a_field, ret); \
+		} else { \
+			ret = trp_right_get(a_base, a_field, ret); \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr uint32_t MAYBE_UNUSED a_pre##insert_recurse(uint32_t cur_node, uint32_t ins_node) \
+{ \
+	if (cur_node == ~0) { \
+		return (ins_node); \
+	} else { \
+		uint32_t ret; \
+		int cmp = (a_cmp)(trpn_pointer(a_base, ins_node), \
+					trpn_pointer(a_base, cur_node)); \
+		if (cmp < 0) { \
+			uint32_t left = a_pre##insert_recurse( \
+				trp_left_get(a_base, a_field, cur_node), ins_node); \
+			trp_left_set(a_base, a_field, cur_node, left); \
+			if (trp_prio_get(left) < trp_prio_get(cur_node)) \
+				trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} else { \
+			uint32_t right = a_pre##insert_recurse( \
+				trp_right_get(a_base, a_field, cur_node), ins_node); \
+			trp_right_set(a_base, a_field, cur_node, right); \
+			if (trp_prio_get(right) < trp_prio_get(cur_node)) \
+				trpn_rotate_left(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} \
+		return (ret); \
+	} \
+} \
+a_attr void MAYBE_UNUSED a_pre##insert(struct trp_root *treap, a_type *node) \
+{ \
+	uint32_t offset = trpn_offset(a_base, node); \
+	trp_node_new(a_base, a_field, offset); \
+	treap->trp_root = a_pre##insert_recurse(treap->trp_root, offset); \
+} \
+a_attr uint32_t MAYBE_UNUSED a_pre##remove_recurse(uint32_t cur_node, uint32_t rem_node) \
+{ \
+	int cmp = a_cmp(trpn_pointer(a_base, rem_node), \
+			trpn_pointer(a_base, cur_node)); \
+	if (cmp == 0) { \
+		uint32_t ret; \
+		uint32_t left = trp_left_get(a_base, a_field, cur_node); \
+		uint32_t right = trp_right_get(a_base, a_field, cur_node); \
+		if (left == ~0) { \
+			if (right == ~0) \
+				return (~0); \
+		} else if (right == ~0 || trp_prio_get(left) < trp_prio_get(right)) { \
+			trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			right = a_pre##remove_recurse(cur_node, rem_node); \
+			trp_right_set(a_base, a_field, ret, right); \
+			return (ret); \
+		} \
+		trpn_rotate_left(a_base, a_field, cur_node, ret); \
+		left = a_pre##remove_recurse(cur_node, rem_node); \
+		trp_left_set(a_base, a_field, ret, left); \
+		return (ret); \
+	} else if (cmp < 0) { \
+		uint32_t left = a_pre##remove_recurse( \
+			trp_left_get(a_base, a_field, cur_node), rem_node); \
+		trp_left_set(a_base, a_field, cur_node, left); \
+		return (cur_node); \
+	} else { \
+		uint32_t right = a_pre##remove_recurse( \
+			trp_right_get(a_base, a_field, cur_node), rem_node); \
+		trp_right_set(a_base, a_field, cur_node, right); \
+		return (cur_node); \
+	} \
+} \
+a_attr void MAYBE_UNUSED a_pre##remove(struct trp_root *treap, a_type *node) \
+{ \
+	treap->trp_root = a_pre##remove_recurse(treap->trp_root, \
+		trpn_offset(a_base, node)); \
+} \
+
+#endif
diff --git a/vcs-svn/trp.txt b/vcs-svn/trp.txt
new file mode 100644
index 0000000..9247eba
--- /dev/null
+++ b/vcs-svn/trp.txt
@@ -0,0 +1,98 @@
+Motivation
+==========
+
+Treaps provide a memory-efficient binary search tree structure.
+Insertion/deletion/search are about as about as fast in the average
+case as red-black trees and the chances of worst-case behavior are
+vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
+behavior is a small price to pay, given that treaps are much simpler
+to implement.
+
+API
+===
+
+The trp API generates a data structure and functions to handle a
+large growing set of objects stored in a pool.
+
+The caller:
+
+. Specifies parameters for the generated functions with the
+  trp_gen(static, foo_, ...) macro.
+
+. Allocates a `struct trp_root` variable and sets it to {~0}.
+
+. Adds new nodes to the set using `foo_insert`.
+
+. Can find a specific item in the set using `foo_search`.
+
+. Can iterate over items in the set using `foo_first` and `foo_next`.
+
+. Can remove an item from the set using `foo_remove`.
+
+Example:
+
+----
+struct ex_node {
+	const char *s;
+	struct trp_node ex_link;
+};
+static struct trp_root ex_base = {~0};
+obj_pool_gen(ex, struct ex_node, 4096);
+trp_gen(static, ex_, struct ex_node, ex_link, ex, strcmp)
+struct ex_node *item;
+
+item = ex_pointer(ex_alloc(1));
+item->s = "hello";
+ex_insert(&ex_base, item);
+item = ex_pointer(ex_alloc(1));
+item->s = "goodbye";
+ex_insert(&ex_base, item);
+for (item = ex_first(&ex_base); item; item = ex_next(&ex_base, item))
+	printf("%s\n", item->s);
+----
+
+Functions
+---------
+
+trp_gen(attr, foo_, node_type, link_field, pool, cmp)::
+
+	Generate a type-specific treap implementation.
++
+. The storage class for generated functions will be 'attr' (e.g., `static`).
+. Generated function names are prefixed with 'foo_' (e.g., `treap_`).
+. Treap nodes will be of type 'node_type' (e.g., `struct treap_node`).
+  This type must be a struct with at least one `struct trp_node` field
+  to point to its children.
+. The field used to access child nodes will be 'link_field'.
+. All treap nodes must lie in the 'pool' object pool.
+. Treap nodes must be totally ordered by the 'cmp' relation, with the
+  following prototype:
++
+int (*cmp)(node_type \*a, node_type \*b)
++
+and returning a value less than, equal to, or greater than zero
+according to the result of comparison.
+
+void foo_insert(struct trp_root *treap, node_type \*node)::
+
+	Insert node into treap.  If inserted multiple times,
+	a node will appear in the treap multiple times.
+
+void foo_remove(struct trp_root *treap, node_type \*node)::
+
+	Remove node from treap.  Caller must ensure node is
+	present in treap before using this function.
+
+node_type *foo_search(struct trp_root \*treap, node_type \*key)::
+
+	Search for a node that matches key.  If no match is found,
+	return what would be key's successor, were key in treap
+	(NULL if no successor).
+
+node_type *foo_first(struct trp_root \*treap)::
+
+	Find the first item from the treap, in sorted order.
+
+node_type *foo_next(struct trp_root \*treap, node_type \*node)::
+
+	Find the next item.
-- 
1.7.2.rc2

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 4/9] Add treap implementation
  2010-07-16 10:23   ` [PATCH 4/9] Add treap implementation Jonathan Nieder
@ 2010-07-16 18:26     ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-07-16 18:26 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Jonathan Nieder wrote:

> Tweaked treap_search() to always return the node after a missing
> node, like it is documented to.

In this case, the documentation was wrong.

> For vcs-svn this doesn’t matter

Or rather, it does.  Sorry about that.

-- 8< --
Subject: vcs-svn: treap_search should return NULL for missing items

In a misguided attempt to make the code match the documentation,
commit 4692f8e7d (Add treap implementation, 2010-07-15) changed
the semantics of treap_search to return the /next/ node when a
node is missing.

That is great in some circumstances (and the new tests even rely on
it), but the rest of vcs-svn relies on treap_search to return
NULL in that case instead.  The documentation only suggested
otherwise because of a typo.

So fix it: now treap_search can do what it was always supposed
to (return NULL on failure) and Jason Evans’s treap_nsearch function
can be used to keep the test suite working.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 test-treap.c    |    2 +-
 vcs-svn/trp.h   |   13 +++++++++++++
 vcs-svn/trp.txt |    9 +++++++--
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/test-treap.c b/test-treap.c
index eae7324..cdba511 100644
--- a/test-treap.c
+++ b/test-treap.c
@@ -52,7 +52,7 @@ int main(int argc, char *argv[])
 		next = node_offset(treap_next(&root, node_pointer(item)));
 
 		treap_remove(&root, node_pointer(item));
-		item = node_offset(treap_search(&root, tmp));
+		item = node_offset(treap_nsearch(&root, tmp));
 
 		if (item != next && (!~item || node_pointer(item)->n != tmp->n))
 			die("found %"PRIuMAX" in place of %"PRIuMAX"",
diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
index 49940cf..1f5f51f 100644
--- a/vcs-svn/trp.h
+++ b/vcs-svn/trp.h
@@ -138,6 +138,19 @@ a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
 	uint32_t ret = treap->trp_root; \
 	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
 		if (cmp < 0) { \
+			ret = trp_left_get(a_base, a_field, ret); \
+		} else { \
+			ret = trp_right_get(a_base, a_field, ret); \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##nsearch(struct trp_root *treap, a_type *key) \
+{ \
+	int cmp; \
+	uint32_t ret = treap->trp_root; \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+		if (cmp < 0) { \
 			if (!~trp_left_get(a_base, a_field, ret)) \
 				break; \
 			ret = trp_left_get(a_base, a_field, ret); \
diff --git a/vcs-svn/trp.txt b/vcs-svn/trp.txt
index 9247eba..eb4c191 100644
--- a/vcs-svn/trp.txt
+++ b/vcs-svn/trp.txt
@@ -86,8 +86,13 @@ void foo_remove(struct trp_root *treap, node_type \*node)::
 node_type *foo_search(struct trp_root \*treap, node_type \*key)::
 
 	Search for a node that matches key.  If no match is found,
-	return what would be key's successor, were key in treap
-	(NULL if no successor).
+	result is NULL.
+
+node_type *foo_nsearch(struct trp_root \*treap, node_type \*key)::
+
+	Like `foo_search`, but if if the key is missing return what
+	would be key's successor, were key in treap (NULL if no
+	successor).
 
 node_type *foo_first(struct trp_root \*treap)::
 
-- 
1.7.2.rc2

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 0/10] rr/svn-export reroll
  2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
  2010-07-16 10:16   ` [PATCH 3/9] Add memory pool library Jonathan Nieder
  2010-07-16 10:23   ` [PATCH 4/9] Add treap implementation Jonathan Nieder
@ 2010-08-09 21:57   ` Jonathan Nieder
  2010-08-09 22:01     ` [PATCH 01/10] Export parse_date_basic() to convert a date string to timestamp Jonathan Nieder
                       ` (10 more replies)
  2 siblings, 11 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 21:57 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Hi!

svn-fe has some serious changes on the horizon.  As a preparation,
let’s round up what we have now.

The most controversial change is probably the new svn-fe test, which
takes about 15 seconds (for the “svnadmin load”, not the svn-fe
step :)).  It is in the t9* series, so hopefully that will not
dissuade people from running the earlier tests.

The main highlight in the changes is a new

	Input error

to stderr if a system call failed in reading in the dump file.
It still returns status 0 in this and other error situations,
though.

Based on maint (for no good reason; that’s just where I tried it).
Intended to replace rr/svn-export in pu (only if Ram likes it, of
course).

Thoughts welcome.

David Barr (5):
  Add memory pool library
  Add string-specific memory pool
  Add stream helper library
  Infrastructure to write revisions in fast-export format
  SVN dump parser

Jason Evans (1):
  Add treap implementation

Jonathan Nieder (4):
  Export parse_date_basic() to convert a date string to timestamp
  Introduce vcs-svn lib
  Update svn-fe manual
  svn-fe manual: Clarify warning about deltas in dumpfiles

 .gitignore                |    5 +
 Makefile                  |   25 +++-
 cache.h                   |    1 +
 contrib/svn-fe/svn-fe.c   |    1 +
 contrib/svn-fe/svn-fe.txt |   19 ++--
 date.c                    |   14 +-
 t/t0080-vcs-svn.sh        |  171 +++++++++++++++++++++++
 t/t9010-svn-fe.sh         |   32 +++++
 test-line-buffer.c        |   46 +++++++
 test-obj-pool.c           |  116 ++++++++++++++++
 test-string-pool.c        |   31 +++++
 test-svn-fe.c             |   18 +++
 test-treap.c              |   65 +++++++++
 vcs-svn/LICENSE           |   33 +++++
 vcs-svn/fast_export.c     |   74 ++++++++++
 vcs-svn/fast_export.h     |   11 ++
 vcs-svn/line_buffer.c     |  102 ++++++++++++++
 vcs-svn/line_buffer.h     |   12 ++
 vcs-svn/line_buffer.txt   |   62 +++++++++
 vcs-svn/obj_pool.h        |   61 +++++++++
 vcs-svn/repo_tree.c       |  328 +++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h       |   26 ++++
 vcs-svn/string_pool.c     |  102 ++++++++++++++
 vcs-svn/string_pool.h     |   11 ++
 vcs-svn/string_pool.txt   |   43 ++++++
 vcs-svn/svndump.c         |  302 +++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h         |    9 ++
 vcs-svn/trp.h             |  236 ++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt           |  103 ++++++++++++++
 29 files changed, 2040 insertions(+), 19 deletions(-)
 create mode 100755 t/t0080-vcs-svn.sh
 create mode 100644 t/t9010-svn-fe.sh
 create mode 100644 test-line-buffer.c
 create mode 100644 test-obj-pool.c
 create mode 100644 test-string-pool.c
 create mode 100644 test-svn-fe.c
 create mode 100644 test-treap.c
 create mode 100644 vcs-svn/LICENSE
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create mode 100644 vcs-svn/line_buffer.txt
 create mode 100644 vcs-svn/obj_pool.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h
 create mode 100644 vcs-svn/string_pool.txt
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

-- 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 01/10] Export parse_date_basic() to convert a date string to timestamp
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
@ 2010-08-09 22:01     ` Jonathan Nieder
  2010-08-09 22:04     ` [PATCH 02/10] Introduce vcs-svn lib Jonathan Nieder
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:01 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

approxidate() is not appropriate for reading machine-written dates
because it guesses instead of erroring out on malformed dates.
parse_date() is less convenient since it returns its output as a
string.  So export the underlying function that writes a timestamp.

While at it, change the return value to match the usual convention:
return 0 for success and -1 for failure.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
As before, I think this improves code clarity, independently of
its use for svn-fe.  So I would not be unhappy if it is applied
as a separate topic.

No change from last round.

 cache.h |    1 +
 date.c  |   14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..68258be 100644
--- a/cache.h
+++ b/cache.h
@@ -811,6 +811,7 @@ const char *show_date_relative(unsigned long time, int tz,
 			       char *timebuf,
 			       size_t timebuf_size);
 int parse_date(const char *date, char *buf, int bufsize);
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset);
 void datestamp(char *buf, int bufsize);
 #define approxidate(s) approxidate_careful((s), NULL)
 unsigned long approxidate_careful(const char *, int *);
diff --git a/date.c b/date.c
index 3c981f7..00f9eb5 100644
--- a/date.c
+++ b/date.c
@@ -586,7 +586,7 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
 
 /* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
    (i.e. English) day/month names, and it doesn't work correctly with %z. */
-int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
+int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
 {
 	struct tm tm;
 	int tm_gmt;
@@ -642,17 +642,16 @@ int parse_date_toffset(const char *date, unsigned long *timestamp, int *offset)
 
 	if (!tm_gmt)
 		*timestamp -= *offset * 60;
-	return 1; /* success */
+	return 0; /* success */
 }
 
 int parse_date(const char *date, char *result, int maxlen)
 {
 	unsigned long timestamp;
 	int offset;
-	if (parse_date_toffset(date, &timestamp, &offset) > 0)
-		return date_string(timestamp, offset, result, maxlen);
-	else
+	if (parse_date_basic(date, &timestamp, &offset))
 		return -1;
+	return date_string(timestamp, offset, result, maxlen);
 }
 
 enum date_mode parse_date_format(const char *format)
@@ -1004,9 +1003,8 @@ unsigned long approxidate_relative(const char *date, const struct timeval *tv)
 	int offset;
 	int errors = 0;
 
-	if (parse_date_toffset(date, &timestamp, &offset) > 0)
+	if (!parse_date_basic(date, &timestamp, &offset))
 		return timestamp;
-
 	return approxidate_str(date, tv, &errors);
 }
 
@@ -1019,7 +1017,7 @@ unsigned long approxidate_careful(const char *date, int *error_ret)
 	if (!error_ret)
 		error_ret = &dummy;
 
-	if (parse_date_toffset(date, &timestamp, &offset) > 0) {
+	if (!parse_date_basic(date, &timestamp, &offset)) {
 		*error_ret = 0;
 		return timestamp;
 	}
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 02/10] Introduce vcs-svn lib
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
  2010-08-09 22:01     ` [PATCH 01/10] Export parse_date_basic() to convert a date string to timestamp Jonathan Nieder
@ 2010-08-09 22:04     ` Jonathan Nieder
  2010-08-09 22:11     ` [PATCH 03/10] Add memory pool library Jonathan Nieder
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:04 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Teach the build system to build a separate library for the
upcoming subversion interop support.

The resulting vcs-svn/lib.a does not contain any code, nor is
it built during a normal build.  This is just scaffolding for
later changes.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
This is just for reference; no change from last round.

 Makefile        |    8 +++++++-
 vcs-svn/LICENSE |   26 ++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletions(-)
 create mode 100644 vcs-svn/LICENSE

diff --git a/Makefile b/Makefile
index f33648d..71cca35 100644
--- a/Makefile
+++ b/Makefile
@@ -468,6 +468,7 @@ export PYTHON_PATH
 
 LIB_FILE=libgit.a
 XDIFF_LIB=xdiff/lib.a
+VCSSVN_LIB=vcs-svn/lib.a
 
 LIB_H += advice.h
 LIB_H += archive.h
@@ -1739,7 +1740,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS)
+VCSSVN_OBJS =
+OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
 dep_dirs := $(addsuffix .depend,$(sort $(dir $(OBJECTS))))
@@ -1860,6 +1862,8 @@ http.o http-walker.o http-push.o remote-curl.o: http.h
 xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
+
+$(VCSSVN_OBJS):
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -1908,6 +1912,8 @@ $(LIB_FILE): $(LIB_OBJS)
 $(XDIFF_LIB): $(XDIFF_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(XDIFF_OBJS)
 
+$(VCSSVN_LIB): $(VCSSVN_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(VCSSVN_OBJS)
 
 doc:
 	$(MAKE) -C Documentation all
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
new file mode 100644
index 0000000..6e52372
--- /dev/null
+++ b/vcs-svn/LICENSE
@@ -0,0 +1,26 @@
+Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice(s), this list of conditions and the following disclaimer
+   unmodified other than the allowable addition of one or more
+   copyright notices.
+2. Redistributions in binary form must reproduce the above copyright
+   notice(s), this list of conditions and the following disclaimer in
+   the documentation and/or other materials provided with the
+   distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 03/10] Add memory pool library
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
  2010-08-09 22:01     ` [PATCH 01/10] Export parse_date_basic() to convert a date string to timestamp Jonathan Nieder
  2010-08-09 22:04     ` [PATCH 02/10] Introduce vcs-svn lib Jonathan Nieder
@ 2010-08-09 22:11     ` Jonathan Nieder
  2010-08-09 22:17     ` [PATCH 04/10] Add treap implementation Jonathan Nieder
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:11 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

Add a memory pool library implemented using C macros. The
obj_pool_gen() macro creates a type-specific memory pool.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This could allow the memory pool to persist between runs of an
application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs.

[rr: minor cleanups]
[jn: added tests; removed file system backing for now]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
The only change from last round is the notes at the end of the commit
message.  Hopefully David is less likely to be blamed for bugs I
introduced this way. :)

 .gitignore         |    1 +
 Makefile           |    4 +-
 t/t0080-vcs-svn.sh |   79 +++++++++++++++++++++++++++++++++++
 test-obj-pool.c    |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/obj_pool.h |   61 +++++++++++++++++++++++++++
 5 files changed, 260 insertions(+), 1 deletions(-)
 create mode 100755 t/t0080-vcs-svn.sh
 create mode 100644 test-obj-pool.c
 create mode 100644 vcs-svn/obj_pool.h

diff --git a/.gitignore b/.gitignore
index 14e2b6b..1e64a6a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -167,6 +167,7 @@
 /test-genrandom
 /test-index-version
 /test-match-trees
+/test-obj-pool
 /test-parse-options
 /test-path-utils
 /test-run-command
diff --git a/Makefile b/Makefile
index 71cca35..eb471e7 100644
--- a/Makefile
+++ b/Makefile
@@ -409,6 +409,7 @@ TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-match-trees
+TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
@@ -1863,7 +1864,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xinclude.h xdiff/xmacros.h xdiff/xdiff.h xdiff/xtypes.h \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
-$(VCSSVN_OBJS):
+$(VCSSVN_OBJS): \
+	vcs-svn/obj_pool.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/t/t0080-vcs-svn.sh b/t/t0080-vcs-svn.sh
new file mode 100755
index 0000000..3f29496
--- /dev/null
+++ b/t/t0080-vcs-svn.sh
@@ -0,0 +1,79 @@
+#!/bin/sh
+
+test_description='check infrastructure for svn importer'
+
+. ./test-lib.sh
+uint32_max=4294967295
+
+test_expect_success 'obj pool: store data' '
+	cat <<-\EOF >expected &&
+	0
+	1
+	EOF
+
+	test-obj-pool <<-\EOF >actual &&
+	alloc one 16
+	set one 13
+	test one 13
+	reset one
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'obj pool: NULL is offset ~0' '
+	echo "$uint32_max" >expected &&
+	echo null one | test-obj-pool >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'obj pool: out-of-bounds access' '
+	cat <<-EOF >expected &&
+	0
+	0
+	$uint32_max
+	$uint32_max
+	16
+	20
+	$uint32_max
+	EOF
+
+	test-obj-pool <<-\EOF >actual &&
+	alloc one 16
+	alloc two 16
+	offset one 20
+	offset two 20
+	alloc one 5
+	offset one 20
+	free one 1
+	offset one 20
+	reset one
+	reset two
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'obj pool: high-water mark' '
+	cat <<-\EOF >expected &&
+	0
+	0
+	10
+	20
+	20
+	20
+	EOF
+
+	test-obj-pool <<-\EOF >actual &&
+	alloc one 10
+	committed one
+	alloc one 10
+	commit one
+	committed one
+	alloc one 10
+	free one 20
+	committed one
+	reset one
+	EOF
+	test_cmp expected actual
+'
+
+test_done
diff --git a/test-obj-pool.c b/test-obj-pool.c
new file mode 100644
index 0000000..5018863
--- /dev/null
+++ b/test-obj-pool.c
@@ -0,0 +1,116 @@
+/*
+ * test-obj-pool.c: code to exercise the svn importer's object pool
+ */
+
+#include "cache.h"
+#include "vcs-svn/obj_pool.h"
+
+enum pool { POOL_ONE, POOL_TWO };
+obj_pool_gen(one, int, 1)
+obj_pool_gen(two, int, 4096)
+
+static uint32_t strtouint32(const char *s)
+{
+	char *end;
+	uintmax_t n = strtoumax(s, &end, 10);
+	if (*s == '\0' || (*end != '\n' && *end != '\0'))
+		die("invalid offset: %s", s);
+	return (uint32_t) n;
+}
+
+static void handle_command(const char *command, enum pool pool, const char *arg)
+{
+	switch (*command) {
+	case 'a':
+		if (!prefixcmp(command, "alloc ")) {
+			uint32_t n = strtouint32(arg);
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_alloc(n) : two_alloc(n));
+			return;
+		}
+	case 'c':
+		if (!prefixcmp(command, "commit ")) {
+			pool == POOL_ONE ? one_commit() : two_commit();
+			return;
+		}
+		if (!prefixcmp(command, "committed ")) {
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_pool.committed : two_pool.committed);
+			return;
+		}
+	case 'f':
+		if (!prefixcmp(command, "free ")) {
+			uint32_t n = strtouint32(arg);
+			pool == POOL_ONE ? one_free(n) : two_free(n);
+			return;
+		}
+	case 'n':
+		if (!prefixcmp(command, "null ")) {
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_offset(NULL) : two_offset(NULL));
+			return;
+		}
+	case 'o':
+		if (!prefixcmp(command, "offset ")) {
+			uint32_t n = strtouint32(arg);
+			printf("%"PRIu32"\n",
+				pool == POOL_ONE ?
+				one_offset(one_pointer(n)) :
+				two_offset(two_pointer(n)));
+			return;
+		}
+	case 'r':
+		if (!prefixcmp(command, "reset ")) {
+			pool == POOL_ONE ? one_reset() : two_reset();
+			return;
+		}
+	case 's':
+		if (!prefixcmp(command, "set ")) {
+			uint32_t n = strtouint32(arg);
+			if (pool == POOL_ONE)
+				*one_pointer(n) = 1;
+			else
+				*two_pointer(n) = 1;
+			return;
+		}
+	case 't':
+		if (!prefixcmp(command, "test ")) {
+			uint32_t n = strtouint32(arg);
+			printf("%d\n", pool == POOL_ONE ?
+				*one_pointer(n) : *two_pointer(n));
+			return;
+		}
+	default:
+		die("unrecognized command: %s", command);
+	}
+}
+
+static void handle_line(const char *line)
+{
+	const char *arg = strchr(line, ' ');
+	enum pool pool;
+
+	if (arg && !prefixcmp(arg + 1, "one"))
+		pool = POOL_ONE;
+	else if (arg && !prefixcmp(arg + 1, "two"))
+		pool = POOL_TWO;
+	else
+		die("no pool specified: %s", line);
+
+	handle_command(line, pool, arg + strlen("one "));
+}
+
+int main(int argc, char *argv[])
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (argc != 1)
+		usage("test-obj-str < script");
+
+	while (strbuf_getline(&sb, stdin, '\n') != EOF)
+		handle_line(sb.buf);
+	strbuf_release(&sb);
+	return 0;
+}
diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
new file mode 100644
index 0000000..deb6eb8
--- /dev/null
+++ b/vcs-svn/obj_pool.h
@@ -0,0 +1,61 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#ifndef OBJ_POOL_H_
+#define OBJ_POOL_H_
+
+#include "git-compat-util.h"
+
+#define MAYBE_UNUSED __attribute__((__unused__))
+
+#define obj_pool_gen(pre, obj_t, initial_capacity) \
+static struct { \
+	uint32_t committed; \
+	uint32_t size; \
+	uint32_t capacity; \
+	obj_t *base; \
+} pre##_pool = {0, 0, 0, NULL}; \
+static MAYBE_UNUSED uint32_t pre##_alloc(uint32_t count) \
+{ \
+	uint32_t offset; \
+	if (pre##_pool.size + count > pre##_pool.capacity) { \
+		while (pre##_pool.size + count > pre##_pool.capacity) \
+			if (pre##_pool.capacity) \
+				pre##_pool.capacity *= 2; \
+			else \
+				pre##_pool.capacity = initial_capacity; \
+		pre##_pool.base = realloc(pre##_pool.base, \
+					pre##_pool.capacity * sizeof(obj_t)); \
+	} \
+	offset = pre##_pool.size; \
+	pre##_pool.size += count; \
+	return offset; \
+} \
+static MAYBE_UNUSED void pre##_free(uint32_t count) \
+{ \
+	pre##_pool.size -= count; \
+} \
+static MAYBE_UNUSED uint32_t pre##_offset(obj_t *obj) \
+{ \
+	return obj == NULL ? ~0 : obj - pre##_pool.base; \
+} \
+static MAYBE_UNUSED obj_t *pre##_pointer(uint32_t offset) \
+{ \
+	return offset >= pre##_pool.size ? NULL : &pre##_pool.base[offset]; \
+} \
+static MAYBE_UNUSED void pre##_commit(void) \
+{ \
+	pre##_pool.committed = pre##_pool.size; \
+} \
+static MAYBE_UNUSED void pre##_reset(void) \
+{ \
+	free(pre##_pool.base); \
+	pre##_pool.base = NULL; \
+	pre##_pool.size = 0; \
+	pre##_pool.capacity = 0; \
+	pre##_pool.committed = 0; \
+}
+
+#endif
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 04/10] Add treap implementation
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (2 preceding siblings ...)
  2010-08-09 22:11     ` [PATCH 03/10] Add memory pool library Jonathan Nieder
@ 2010-08-09 22:17     ` Jonathan Nieder
  2010-08-12 17:22       ` Junio C Hamano
  2010-08-09 22:34     ` [PATCH 05/10] Add string-specific memory pool Jonathan Nieder
                       ` (6 subsequent siblings)
  10 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:17 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: Jason Evans <jasone@canonware.com>

Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]
[rr: Squelched compiler warnings]
[db: Added support for immutable treap nodes]
[jn: Reintroduced treap_nsearch(); with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
With the treap_nsearch() fixup from last time squashed in and some
more history in the log message.

 .gitignore         |    1 +
 Makefile           |    3 +-
 t/t0080-vcs-svn.sh |   22 +++++
 test-treap.c       |   65 ++++++++++++++
 vcs-svn/LICENSE    |    3 +
 vcs-svn/trp.h      |  236 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt    |  103 +++++++++++++++++++++++
 7 files changed, 432 insertions(+), 1 deletions(-)
 create mode 100644 test-treap.c
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

diff --git a/.gitignore b/.gitignore
index 1e64a6a..af47653 100644
--- a/.gitignore
+++ b/.gitignore
@@ -173,6 +173,7 @@
 /test-run-command
 /test-sha1
 /test-sigchain
+/test-treap
 /common-cmds.h
 *.tar.gz
 *.dsc
diff --git a/Makefile b/Makefile
index eb471e7..e7c33ec 100644
--- a/Makefile
+++ b/Makefile
@@ -415,6 +415,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-treap
 TEST_PROGRAMS_NEED_X += test-index-version
 
 TEST_PROGRAMS = $(patsubst %,%$X,$(TEST_PROGRAMS_NEED_X))
@@ -1865,7 +1866,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/t/t0080-vcs-svn.sh b/t/t0080-vcs-svn.sh
index 3f29496..ce02c58 100755
--- a/t/t0080-vcs-svn.sh
+++ b/t/t0080-vcs-svn.sh
@@ -76,4 +76,26 @@ test_expect_success 'obj pool: high-water mark' '
 	test_cmp expected actual
 '
 
+test_expect_success 'treap sort' '
+	cat <<-\EOF >unsorted &&
+	68
+	12
+	13
+	13
+	68
+	13
+	13
+	21
+	10
+	11
+	12
+	13
+	13
+	EOF
+	sort unsorted >expected &&
+
+	test-treap <unsorted >actual &&
+	test_cmp expected actual
+'
+
 test_done
diff --git a/test-treap.c b/test-treap.c
new file mode 100644
index 0000000..cdba511
--- /dev/null
+++ b/test-treap.c
@@ -0,0 +1,65 @@
+/*
+ * test-treap.c: code to exercise the svn importer's treap structure
+ */
+
+#include "cache.h"
+#include "vcs-svn/obj_pool.h"
+#include "vcs-svn/trp.h"
+
+struct int_node {
+	uintmax_t n;
+	struct trp_node children;
+};
+
+obj_pool_gen(node, struct int_node, 3)
+
+static int node_cmp(struct int_node *a, struct int_node *b)
+{
+	return (a->n > b->n) - (a->n < b->n);
+}
+
+trp_gen(static, treap_, struct int_node, children, node, node_cmp)
+
+static void strtonode(struct int_node *item, const char *s)
+{
+	char *end;
+	item->n = strtoumax(s, &end, 10);
+	if (*s == '\0' || (*end != '\n' && *end != '\0'))
+		die("invalid integer: %s", s);
+}
+
+int main(int argc, char *argv[])
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct trp_root root = { ~0 };
+	uint32_t item;
+
+	if (argc != 1)
+		usage("test-treap < ints");
+
+	while (strbuf_getline(&sb, stdin, '\n') != EOF) {
+		item = node_alloc(1);
+		strtonode(node_pointer(item), sb.buf);
+		treap_insert(&root, node_pointer(item));
+	}
+
+	item = node_offset(treap_first(&root));
+	while (~item) {
+		uint32_t next;
+		struct int_node *tmp = node_pointer(node_alloc(1));
+
+		tmp->n = node_pointer(item)->n;
+		next = node_offset(treap_next(&root, node_pointer(item)));
+
+		treap_remove(&root, node_pointer(item));
+		item = node_offset(treap_nsearch(&root, tmp));
+
+		if (item != next && (!~item || node_pointer(item)->n != tmp->n))
+			die("found %"PRIuMAX" in place of %"PRIuMAX"",
+				~item ? node_pointer(item)->n : ~(uintmax_t) 0,
+				~next ? node_pointer(next)->n : ~(uintmax_t) 0);
+		printf("%"PRIuMAX"\n", tmp->n);
+	}
+	node_reset();
+	return 0;
+}
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index 6e52372..a3d384c 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -1,6 +1,9 @@
 Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
 All rights reserved.
 
+Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
+All rights reserved.
+
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
new file mode 100644
index 0000000..1f5f51f
--- /dev/null
+++ b/vcs-svn/trp.h
@@ -0,0 +1,236 @@
+/*
+ * C macro implementation of treaps.
+ *
+ * Usage:
+ *   #include <stdint.h>
+ *   #include "trp.h"
+ *   trp_gen(...)
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#ifndef TRP_H_
+#define TRP_H_
+
+#define MAYBE_UNUSED __attribute__((__unused__))
+
+/* Node structure. */
+struct trp_node {
+	uint32_t trpn_left;
+	uint32_t trpn_right;
+};
+
+/* Root structure. */
+struct trp_root {
+	uint32_t trp_root;
+};
+
+/* Pointer/Offset conversion. */
+#define trpn_pointer(a_base, a_offset) (a_base##_pointer(a_offset))
+#define trpn_offset(a_base, a_pointer) (a_base##_offset(a_pointer))
+#define trpn_modify(a_base, a_offset) \
+	do { \
+		if ((a_offset) < a_base##_pool.committed) { \
+			uint32_t old_offset = (a_offset);\
+			(a_offset) = a_base##_alloc(1); \
+			*trpn_pointer(a_base, a_offset) = \
+				*trpn_pointer(a_base, old_offset); \
+		} \
+	} while (0);
+
+/* Left accessors. */
+#define trp_left_get(a_base, a_field, a_node) \
+	(trpn_pointer(a_base, a_node)->a_field.trpn_left)
+#define trp_left_set(a_base, a_field, a_node, a_left) \
+	do { \
+		trpn_modify(a_base, a_node); \
+		trp_left_get(a_base, a_field, a_node) = (a_left); \
+	} while(0)
+
+/* Right accessors. */
+#define trp_right_get(a_base, a_field, a_node) \
+	(trpn_pointer(a_base, a_node)->a_field.trpn_right)
+#define trp_right_set(a_base, a_field, a_node, a_right) \
+	do { \
+		trpn_modify(a_base, a_node); \
+		trp_right_get(a_base, a_field, a_node) = (a_right); \
+	} while(0)
+
+/*
+ * Fibonacci hash function.
+ * The multiplier is the nearest prime to (2^32 times (√5 - 1)/2).
+ * See Knuth §6.4: volume 3, 3rd ed, p518.
+ */
+#define trpn_hash(a_node) (uint32_t) (2654435761u * (a_node))
+
+/* Priority accessors. */
+#define trp_prio_get(a_node) trpn_hash(a_node)
+
+/* Node initializer. */
+#define trp_node_new(a_base, a_field, a_node) \
+	do { \
+		trp_left_set(a_base, a_field, (a_node), ~0); \
+		trp_right_set(a_base, a_field, (a_node), ~0); \
+	} while(0)
+
+/* Internal utility macros. */
+#define trpn_first(a_base, a_field, a_root, r_node) \
+	do { \
+		(r_node) = (a_root); \
+		if ((r_node) == ~0) \
+			return NULL; \
+		while (~trp_left_get(a_base, a_field, (r_node))) \
+			(r_node) = trp_left_get(a_base, a_field, (r_node)); \
+	} while (0)
+
+#define trpn_rotate_left(a_base, a_field, a_node, r_node) \
+	do { \
+		(r_node) = trp_right_get(a_base, a_field, (a_node)); \
+		trp_right_set(a_base, a_field, (a_node), \
+			trp_left_get(a_base, a_field, (r_node))); \
+		trp_left_set(a_base, a_field, (r_node), (a_node)); \
+	} while(0)
+
+#define trpn_rotate_right(a_base, a_field, a_node, r_node) \
+	do { \
+		(r_node) = trp_left_get(a_base, a_field, (a_node)); \
+		trp_left_set(a_base, a_field, (a_node), \
+			trp_right_get(a_base, a_field, (r_node))); \
+		trp_right_set(a_base, a_field, (r_node), (a_node)); \
+	} while(0)
+
+#define trp_gen(a_attr, a_pre, a_type, a_field, a_base, a_cmp) \
+a_attr a_type MAYBE_UNUSED *a_pre##first(struct trp_root *treap) \
+{ \
+	uint32_t ret; \
+	trpn_first(a_base, a_field, treap->trp_root, ret); \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##next(struct trp_root *treap, a_type *node) \
+{ \
+	uint32_t ret; \
+	uint32_t offset = trpn_offset(a_base, node); \
+	if (~trp_right_get(a_base, a_field, offset)) { \
+		trpn_first(a_base, a_field, \
+			trp_right_get(a_base, a_field, offset), ret); \
+	} else { \
+		uint32_t tnode = treap->trp_root; \
+		ret = ~0; \
+		while (1) { \
+			int cmp = (a_cmp)(trpn_pointer(a_base, offset), \
+				trpn_pointer(a_base, tnode)); \
+			if (cmp < 0) { \
+				ret = tnode; \
+				tnode = trp_left_get(a_base, a_field, tnode); \
+			} else if (cmp > 0) { \
+				tnode = trp_right_get(a_base, a_field, tnode); \
+			} else { \
+				break; \
+			} \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
+{ \
+	int cmp; \
+	uint32_t ret = treap->trp_root; \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+		if (cmp < 0) { \
+			ret = trp_left_get(a_base, a_field, ret); \
+		} else { \
+			ret = trp_right_get(a_base, a_field, ret); \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr a_type MAYBE_UNUSED *a_pre##nsearch(struct trp_root *treap, a_type *key) \
+{ \
+	int cmp; \
+	uint32_t ret = treap->trp_root; \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+		if (cmp < 0) { \
+			if (!~trp_left_get(a_base, a_field, ret)) \
+				break; \
+			ret = trp_left_get(a_base, a_field, ret); \
+		} else { \
+			ret = trp_right_get(a_base, a_field, ret); \
+		} \
+	} \
+	return trpn_pointer(a_base, ret); \
+} \
+a_attr uint32_t MAYBE_UNUSED a_pre##insert_recurse(uint32_t cur_node, uint32_t ins_node) \
+{ \
+	if (cur_node == ~0) { \
+		return (ins_node); \
+	} else { \
+		uint32_t ret; \
+		int cmp = (a_cmp)(trpn_pointer(a_base, ins_node), \
+					trpn_pointer(a_base, cur_node)); \
+		if (cmp < 0) { \
+			uint32_t left = a_pre##insert_recurse( \
+				trp_left_get(a_base, a_field, cur_node), ins_node); \
+			trp_left_set(a_base, a_field, cur_node, left); \
+			if (trp_prio_get(left) < trp_prio_get(cur_node)) \
+				trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} else { \
+			uint32_t right = a_pre##insert_recurse( \
+				trp_right_get(a_base, a_field, cur_node), ins_node); \
+			trp_right_set(a_base, a_field, cur_node, right); \
+			if (trp_prio_get(right) < trp_prio_get(cur_node)) \
+				trpn_rotate_left(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} \
+		return (ret); \
+	} \
+} \
+a_attr void MAYBE_UNUSED a_pre##insert(struct trp_root *treap, a_type *node) \
+{ \
+	uint32_t offset = trpn_offset(a_base, node); \
+	trp_node_new(a_base, a_field, offset); \
+	treap->trp_root = a_pre##insert_recurse(treap->trp_root, offset); \
+} \
+a_attr uint32_t MAYBE_UNUSED a_pre##remove_recurse(uint32_t cur_node, uint32_t rem_node) \
+{ \
+	int cmp = a_cmp(trpn_pointer(a_base, rem_node), \
+			trpn_pointer(a_base, cur_node)); \
+	if (cmp == 0) { \
+		uint32_t ret; \
+		uint32_t left = trp_left_get(a_base, a_field, cur_node); \
+		uint32_t right = trp_right_get(a_base, a_field, cur_node); \
+		if (left == ~0) { \
+			if (right == ~0) \
+				return (~0); \
+		} else if (right == ~0 || trp_prio_get(left) < trp_prio_get(right)) { \
+			trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			right = a_pre##remove_recurse(cur_node, rem_node); \
+			trp_right_set(a_base, a_field, ret, right); \
+			return (ret); \
+		} \
+		trpn_rotate_left(a_base, a_field, cur_node, ret); \
+		left = a_pre##remove_recurse(cur_node, rem_node); \
+		trp_left_set(a_base, a_field, ret, left); \
+		return (ret); \
+	} else if (cmp < 0) { \
+		uint32_t left = a_pre##remove_recurse( \
+			trp_left_get(a_base, a_field, cur_node), rem_node); \
+		trp_left_set(a_base, a_field, cur_node, left); \
+		return (cur_node); \
+	} else { \
+		uint32_t right = a_pre##remove_recurse( \
+			trp_right_get(a_base, a_field, cur_node), rem_node); \
+		trp_right_set(a_base, a_field, cur_node, right); \
+		return (cur_node); \
+	} \
+} \
+a_attr void MAYBE_UNUSED a_pre##remove(struct trp_root *treap, a_type *node) \
+{ \
+	treap->trp_root = a_pre##remove_recurse(treap->trp_root, \
+		trpn_offset(a_base, node)); \
+} \
+
+#endif
diff --git a/vcs-svn/trp.txt b/vcs-svn/trp.txt
new file mode 100644
index 0000000..eb4c191
--- /dev/null
+++ b/vcs-svn/trp.txt
@@ -0,0 +1,103 @@
+Motivation
+==========
+
+Treaps provide a memory-efficient binary search tree structure.
+Insertion/deletion/search are about as about as fast in the average
+case as red-black trees and the chances of worst-case behavior are
+vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
+behavior is a small price to pay, given that treaps are much simpler
+to implement.
+
+API
+===
+
+The trp API generates a data structure and functions to handle a
+large growing set of objects stored in a pool.
+
+The caller:
+
+. Specifies parameters for the generated functions with the
+  trp_gen(static, foo_, ...) macro.
+
+. Allocates a `struct trp_root` variable and sets it to {~0}.
+
+. Adds new nodes to the set using `foo_insert`.
+
+. Can find a specific item in the set using `foo_search`.
+
+. Can iterate over items in the set using `foo_first` and `foo_next`.
+
+. Can remove an item from the set using `foo_remove`.
+
+Example:
+
+----
+struct ex_node {
+	const char *s;
+	struct trp_node ex_link;
+};
+static struct trp_root ex_base = {~0};
+obj_pool_gen(ex, struct ex_node, 4096);
+trp_gen(static, ex_, struct ex_node, ex_link, ex, strcmp)
+struct ex_node *item;
+
+item = ex_pointer(ex_alloc(1));
+item->s = "hello";
+ex_insert(&ex_base, item);
+item = ex_pointer(ex_alloc(1));
+item->s = "goodbye";
+ex_insert(&ex_base, item);
+for (item = ex_first(&ex_base); item; item = ex_next(&ex_base, item))
+	printf("%s\n", item->s);
+----
+
+Functions
+---------
+
+trp_gen(attr, foo_, node_type, link_field, pool, cmp)::
+
+	Generate a type-specific treap implementation.
++
+. The storage class for generated functions will be 'attr' (e.g., `static`).
+. Generated function names are prefixed with 'foo_' (e.g., `treap_`).
+. Treap nodes will be of type 'node_type' (e.g., `struct treap_node`).
+  This type must be a struct with at least one `struct trp_node` field
+  to point to its children.
+. The field used to access child nodes will be 'link_field'.
+. All treap nodes must lie in the 'pool' object pool.
+. Treap nodes must be totally ordered by the 'cmp' relation, with the
+  following prototype:
++
+int (*cmp)(node_type \*a, node_type \*b)
++
+and returning a value less than, equal to, or greater than zero
+according to the result of comparison.
+
+void foo_insert(struct trp_root *treap, node_type \*node)::
+
+	Insert node into treap.  If inserted multiple times,
+	a node will appear in the treap multiple times.
+
+void foo_remove(struct trp_root *treap, node_type \*node)::
+
+	Remove node from treap.  Caller must ensure node is
+	present in treap before using this function.
+
+node_type *foo_search(struct trp_root \*treap, node_type \*key)::
+
+	Search for a node that matches key.  If no match is found,
+	result is NULL.
+
+node_type *foo_nsearch(struct trp_root \*treap, node_type \*key)::
+
+	Like `foo_search`, but if if the key is missing return what
+	would be key's successor, were key in treap (NULL if no
+	successor).
+
+node_type *foo_first(struct trp_root \*treap)::
+
+	Find the first item from the treap, in sorted order.
+
+node_type *foo_next(struct trp_root \*treap, node_type \*node)::
+
+	Find the next item.
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 05/10] Add string-specific memory pool
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (3 preceding siblings ...)
  2010-08-09 22:17     ` [PATCH 04/10] Add treap implementation Jonathan Nieder
@ 2010-08-09 22:34     ` Jonathan Nieder
  2010-08-12 17:22       ` Junio C Hamano
  2010-08-09 22:39     ` [PATCH 06/10] Add stream helper library Jonathan Nieder
                       ` (5 subsequent siblings)
  10 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:34 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

Intern strings so they can be compared by address and stored without
wasting space.

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

[rr: added API docs]
[jn: with some API simplifications, new documentation and tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
New test.  The return value from pool_tok_seq is not checked by
the vcs-svn lib but trying to use it in tests revealed it was not
so intuitive.  pool_tok_seq() was behaving strangely when passed
an array of size 0; I think there is nothing sane to do in that
case --- maybe it should abort().  The API was passing around
char * that cannot be modified; changed to const char *.

Another set of eyes on this would be welcome.

 .gitignore              |    1 +
 Makefile                |    9 +++-
 t/t0080-vcs-svn.sh      |   16 +++++++
 test-string-pool.c      |   31 ++++++++++++++
 vcs-svn/string_pool.c   |  102 +++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/string_pool.h   |   11 +++++
 vcs-svn/string_pool.txt |   43 ++++++++++++++++++++
 7 files changed, 210 insertions(+), 3 deletions(-)
 create mode 100644 test-string-pool.c
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h
 create mode 100644 vcs-svn/string_pool.txt

diff --git a/.gitignore b/.gitignore
index af47653..9f109db 100644
--- a/.gitignore
+++ b/.gitignore
@@ -173,6 +173,7 @@
 /test-run-command
 /test-sha1
 /test-sigchain
+/test-string-pool
 /test-treap
 /common-cmds.h
 *.tar.gz
diff --git a/Makefile b/Makefile
index e7c33ec..24103c9 100644
--- a/Makefile
+++ b/Makefile
@@ -415,6 +415,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-string-pool
 TEST_PROGRAMS_NEED_X += test-treap
 TEST_PROGRAMS_NEED_X += test-index-version
 
@@ -1742,7 +1743,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS =
+VCSSVN_OBJS = vcs-svn/string_pool.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1866,7 +1867,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h vcs-svn/trp.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -2017,10 +2018,12 @@ test-delta$X: diff-delta.o patch-delta.o
 
 test-parse-options$X: parse-options.o
 
+test-string-pool$X: vcs-svn/lib.a
+
 .PRECIOUS: $(TEST_OBJS)
 
 test-%$X: test-%.o $(GITLIBS)
-	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
 check-sha1:: test-sha1$X
 	./test-sha1.sh
diff --git a/t/t0080-vcs-svn.sh b/t/t0080-vcs-svn.sh
index ce02c58..99a314b 100755
--- a/t/t0080-vcs-svn.sh
+++ b/t/t0080-vcs-svn.sh
@@ -76,6 +76,22 @@ test_expect_success 'obj pool: high-water mark' '
 	test_cmp expected actual
 '
 
+test_expect_success 'string pool' '
+	echo a does not equal b >expected.differ &&
+	echo a equals a >expected.match &&
+	echo equals equals equals >expected.matchmore &&
+
+	test-string-pool "a,--b" >actual.differ &&
+	test-string-pool "a,a" >actual.match &&
+	test-string-pool "equals-equals" >actual.matchmore &&
+	test_must_fail test-string-pool a,a,a &&
+	test_must_fail test-string-pool a &&
+
+	test_cmp expected.differ actual.differ &&
+	test_cmp expected.match actual.match &&
+	test_cmp expected.matchmore actual.matchmore
+'
+
 test_expect_success 'treap sort' '
 	cat <<-\EOF >unsorted &&
 	68
diff --git a/test-string-pool.c b/test-string-pool.c
new file mode 100644
index 0000000..2adf84b
--- /dev/null
+++ b/test-string-pool.c
@@ -0,0 +1,31 @@
+/*
+ * test-string-pool.c: code to exercise the svn importer's string pool
+ */
+
+#include "git-compat-util.h"
+#include "vcs-svn/string_pool.h"
+
+int main(int argc, char *argv[])
+{
+	const uint32_t unequal = pool_intern("does not equal");
+	const uint32_t equal = pool_intern("equals");
+	uint32_t buf[3];
+	uint32_t n;
+
+	if (argc != 2)
+		usage("test-string-pool <string>,<string>");
+
+	n = pool_tok_seq(3, buf, ",-", argv[1]);
+	if (n >= 3)
+		die("too many strings");
+	if (n <= 1)
+		die("too few strings");
+
+	buf[2] = buf[1];
+	buf[1] = (buf[0] == buf[2]) ? equal : unequal;
+	pool_print_seq(3, buf, ' ', stdout);
+	fputc('\n', stdout);
+
+	pool_reset();
+	return 0;
+}
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
new file mode 100644
index 0000000..550f0e5
--- /dev/null
+++ b/vcs-svn/string_pool.c
@@ -0,0 +1,102 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "trp.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+static struct trp_root tree = { ~0 };
+
+struct node {
+	uint32_t offset;
+	struct trp_node children;
+};
+
+/* Two memory pools: one for struct node, and another for strings */
+obj_pool_gen(node, struct node, 4096)
+obj_pool_gen(string, char, 4096)
+
+static char *node_value(struct node *node)
+{
+	return node ? string_pointer(node->offset) : NULL;
+}
+
+static int node_cmp(struct node *a, struct node *b)
+{
+	return strcmp(node_value(a), node_value(b));
+}
+
+/* Build a Treap from the node structure (a trp_node w/ offset) */
+trp_gen(static, tree_, struct node, children, node, node_cmp);
+
+const char *pool_fetch(uint32_t entry)
+{
+	return node_value(node_pointer(entry));
+}
+
+uint32_t pool_intern(const char *key)
+{
+	/* Canonicalize key */
+	struct node *match = NULL;
+	uint32_t key_len;
+	if (key == NULL)
+		return ~0;
+	key_len = strlen(key) + 1;
+	struct node *node = node_pointer(node_alloc(1));
+	node->offset = string_alloc(key_len);
+	strcpy(node_value(node), key);
+	match = tree_search(&tree, node);
+	if (!match) {
+		tree_insert(&tree, node);
+	} else {
+		node_free(1);
+		string_free(key_len);
+		node = match;
+	}
+	return node_offset(node);
+}
+
+uint32_t pool_tok_r(char *str, const char *delim, char **saveptr)
+{
+	char *token = strtok_r(str, delim, saveptr);
+	return token ? pool_intern(token) : ~0;
+}
+
+void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream)
+{
+	uint32_t i;
+	for (i = 0; i < len && ~seq[i]; i++) {
+		fputs(pool_fetch(seq[i]), stream);
+		if (i < len - 1 && ~seq[i + 1])
+			fputc(delim, stream);
+	}
+}
+
+uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str)
+{
+	char *context = NULL;
+	uint32_t token = ~0;
+	uint32_t length;
+
+	if (sz == 0)
+		return ~0;
+	if (str)
+		token = pool_tok_r(str, delim, &context);
+	for (length = 0; length < sz; length++) {
+		seq[length] = token;
+		if (token == ~0)
+			return length;
+		token = pool_tok_r(NULL, delim, &context);
+	}
+	seq[sz - 1] = ~0;
+	return sz;
+}
+
+void pool_reset(void)
+{
+	node_reset();
+	string_reset();
+}
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
new file mode 100644
index 0000000..222fb66
--- /dev/null
+++ b/vcs-svn/string_pool.h
@@ -0,0 +1,11 @@
+#ifndef STRING_POOL_H_
+#define STRING_POOL_H_
+
+uint32_t pool_intern(const char *key);
+const char *pool_fetch(uint32_t entry);
+uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
+void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream);
+uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str);
+void pool_reset(void);
+
+#endif
diff --git a/vcs-svn/string_pool.txt b/vcs-svn/string_pool.txt
new file mode 100644
index 0000000..1b41f15
--- /dev/null
+++ b/vcs-svn/string_pool.txt
@@ -0,0 +1,43 @@
+string_pool API
+===============
+
+The string_pool API provides facilities for replacing strings
+with integer keys that can be more easily compared and stored.
+The facilities are designed so that one could teach Git without
+too much trouble to store the information needed for these keys to
+remain valid over multiple executions.
+
+Functions
+---------
+
+pool_intern::
+	Include a string in the string pool and get its key.
+	If that string is already in the pool, retrieves its
+	existing key.
+
+pool_fetch::
+	Retrieve the string associated to a given key.
+
+pool_tok_r::
+	Extract the key of the next token from a string.
+	Interface mimics strtok_r.
+
+pool_print_seq::
+	Print a sequence of strings named by key to a file, using the
+	specified delimiter to separate them.
+
+	If NULL (key ~0) appears in the sequence, the sequence ends
+	early.
+
+pool_tok_seq::
+	Split a string into tokens, storing the keys of segments
+	into a caller-provided array.
+
+	Unless sz is 0, the array will always be ~0-terminated.
+	If there is not enough room for all the tokens, the
+	array holds as many tokens as fit in the entries before
+	the terminating ~0.  Return value is the index after the
+	last token, or sz if the tokens did not fit.
+
+pool_reset::
+	Deallocate storage for the string pool.
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 06/10] Add stream helper library
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (4 preceding siblings ...)
  2010-08-09 22:34     ` [PATCH 05/10] Add string-specific memory pool Jonathan Nieder
@ 2010-08-09 22:39     ` Jonathan Nieder
  2010-08-09 22:48     ` [PATCH 07/10] Infrastructure to write revisions in fast-export format Jonathan Nieder
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:39 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

This library provides thread-unsafe fgets()- and fread()-like
functions where the caller does not have to supply a buffer.  It
maintains a couple of static buffers and provides an API to use
them.

[rr: allow input from files other than stdin]
[jn: with tests, documentation, and error handling improvements]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
New tests and API docs.  The return value from buffer_deinit
can be used to check for errors now (I found this useful when
writing tests).

 .gitignore              |    1 +
 Makefile                |    8 +++-
 t/t0080-vcs-svn.sh      |   54 ++++++++++++++++++++++++++
 test-line-buffer.c      |   46 ++++++++++++++++++++++
 vcs-svn/line_buffer.c   |   97 +++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/line_buffer.h   |   12 ++++++
 vcs-svn/line_buffer.txt |   58 ++++++++++++++++++++++++++++
 7 files changed, 274 insertions(+), 2 deletions(-)
 create mode 100644 test-line-buffer.c
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create mode 100644 vcs-svn/line_buffer.txt

diff --git a/.gitignore b/.gitignore
index 9f109db..8c0512e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -166,6 +166,7 @@
 /test-dump-cache-tree
 /test-genrandom
 /test-index-version
+/test-line-buffer
 /test-match-trees
 /test-obj-pool
 /test-parse-options
diff --git a/Makefile b/Makefile
index 24103c9..a76cce5 100644
--- a/Makefile
+++ b/Makefile
@@ -408,6 +408,7 @@ TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
+TEST_PROGRAMS_NEED_X += test-line-buffer
 TEST_PROGRAMS_NEED_X += test-match-trees
 TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
@@ -1743,7 +1744,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/string_pool.o
+VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1867,7 +1868,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 	xdiff/xutils.h xdiff/xprepare.h xdiff/xdiffi.h xdiff/xemit.h
 
 $(VCSSVN_OBJS): \
-	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h
+	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
+	vcs-svn/line_buffer.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -2016,6 +2018,8 @@ test-date$X: date.o ctype.o
 
 test-delta$X: diff-delta.o patch-delta.o
 
+test-line-buffer$X: vcs-svn/lib.a
+
 test-parse-options$X: parse-options.o
 
 test-string-pool$X: vcs-svn/lib.a
diff --git a/t/t0080-vcs-svn.sh b/t/t0080-vcs-svn.sh
index 99a314b..d3225ad 100755
--- a/t/t0080-vcs-svn.sh
+++ b/t/t0080-vcs-svn.sh
@@ -76,6 +76,60 @@ test_expect_success 'obj pool: high-water mark' '
 	test_cmp expected actual
 '
 
+test_expect_success 'line buffer' '
+	echo HELLO >expected1 &&
+	printf "%s\n" "" HELLO >expected2 &&
+	echo >expected3 &&
+	printf "%s\n" "" Q | q_to_nul >expected4 &&
+	printf "%s\n" foo "" >expected5 &&
+	printf "%s\n" "" foo >expected6 &&
+
+	test-line-buffer <<-\EOF >actual1 &&
+	5
+	HELLO
+	EOF
+
+	test-line-buffer <<-\EOF >actual2 &&
+	0
+
+	5
+	HELLO
+	EOF
+
+	q_to_nul <<-\EOF |
+	1
+	Q
+	EOF
+	test-line-buffer >actual3 &&
+
+	q_to_nul <<-\EOF |
+	0
+
+	1
+	Q
+	EOF
+	test-line-buffer >actual4 &&
+
+	test-line-buffer <<-\EOF >actual5 &&
+	5
+	foo
+	EOF
+
+	test-line-buffer <<-\EOF >actual6 &&
+	0
+
+	5
+	foo
+	EOF
+
+	test_cmp expected1 actual1 &&
+	test_cmp expected2 actual2 &&
+	test_cmp expected3 actual3 &&
+	test_cmp expected4 actual4 &&
+	test_cmp expected5 actual5 &&
+	test_cmp expected6 actual6
+'
+
 test_expect_success 'string pool' '
 	echo a does not equal b >expected.differ &&
 	echo a equals a >expected.match &&
diff --git a/test-line-buffer.c b/test-line-buffer.c
new file mode 100644
index 0000000..c11bf7f
--- /dev/null
+++ b/test-line-buffer.c
@@ -0,0 +1,46 @@
+/*
+ * test-line-buffer.c: code to exercise the svn importer's input helper
+ *
+ * Input format:
+ *	number NL
+ *	(number bytes) NL
+ *	number NL
+ *	...
+ */
+
+#include "git-compat-util.h"
+#include "vcs-svn/line_buffer.h"
+
+static uint32_t strtouint32(const char *s)
+{
+	char *end;
+	uintmax_t n = strtoumax(s, &end, 10);
+	if (*s == '\0' || *end != '\0')
+		die("invalid count: %s", s);
+	return (uint32_t) n;
+}
+
+int main(int argc, char *argv[])
+{
+	char *s;
+
+	if (argc != 1)
+		usage("test-line-buffer < input.txt");
+	if (buffer_init(NULL))
+		die_errno("open error");
+	while ((s = buffer_read_line())) {
+		s = buffer_read_string(strtouint32(s));
+		fputs(s, stdout);
+		fputc('\n', stdout);
+		buffer_skip_bytes(1);
+		if (!(s = buffer_read_line()))
+			break;
+		buffer_copy_bytes(strtouint32(s) + 1);
+	}
+	if (buffer_deinit())
+		die("input error");
+	if (ferror(stdout))
+		die("output error");
+	buffer_reset();
+	return 0;
+}
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
new file mode 100644
index 0000000..1543567
--- /dev/null
+++ b/vcs-svn/line_buffer.c
@@ -0,0 +1,97 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "line_buffer.h"
+#include "obj_pool.h"
+
+#define LINE_BUFFER_LEN 10000
+#define COPY_BUFFER_LEN 4096
+
+/* Create memory pool for char sequence of known length */
+obj_pool_gen(blob, char, 4096)
+
+static char line_buffer[LINE_BUFFER_LEN];
+static char byte_buffer[COPY_BUFFER_LEN];
+static FILE *infile;
+
+int buffer_init(const char *filename)
+{
+	infile = filename ? fopen(filename, "r") : stdin;
+	if (!infile)
+		return -1;
+	return 0;
+}
+
+int buffer_deinit(void)
+{
+	int err;
+	if (infile == stdin)
+		return ferror(infile);
+	err = ferror(infile);
+	err |= fclose(infile);
+	return err;
+}
+
+/* Read a line without trailing newline. */
+char *buffer_read_line(void)
+{
+	char *end;
+	if (!fgets(line_buffer, sizeof(line_buffer), infile))
+		/* Error or data exhausted. */
+		return NULL;
+	end = line_buffer + strlen(line_buffer);
+	if (end[-1] == '\n')
+		end[-1] = '\0';
+	else if (feof(infile))
+		; /* No newline at end of file.  That's fine. */
+	else
+		/*
+		 * Line was too long.
+		 * There is probably a saner way to deal with this,
+		 * but for now let's return an error.
+		 */
+		return NULL;
+	return line_buffer;
+}
+
+char *buffer_read_string(uint32_t len)
+{
+	char *s;
+	blob_free(blob_pool.size);
+	s = blob_pointer(blob_alloc(len + 1));
+	s[fread(s, 1, len, infile)] = '\0';
+	return ferror(infile) ? NULL : s;
+}
+
+void buffer_copy_bytes(uint32_t len)
+{
+	uint32_t in;
+	while (len > 0 && !feof(infile) && !ferror(infile)) {
+		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		in = fread(byte_buffer, 1, in, infile);
+		len -= in;
+		fwrite(byte_buffer, 1, in, stdout);
+		if (ferror(stdout)) {
+			buffer_skip_bytes(len);
+			return;
+		}
+	}
+}
+
+void buffer_skip_bytes(uint32_t len)
+{
+	uint32_t in;
+	while (len > 0 && !feof(infile) && !ferror(infile)) {
+		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		in = fread(byte_buffer, 1, in, infile);
+		len -= in;
+	}
+}
+
+void buffer_reset(void)
+{
+	blob_reset();
+}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
new file mode 100644
index 0000000..9c78ae1
--- /dev/null
+++ b/vcs-svn/line_buffer.h
@@ -0,0 +1,12 @@
+#ifndef LINE_BUFFER_H_
+#define LINE_BUFFER_H_
+
+int buffer_init(const char *filename);
+int buffer_deinit(void);
+char *buffer_read_line(void);
+char *buffer_read_string(uint32_t len);
+void buffer_copy_bytes(uint32_t len);
+void buffer_skip_bytes(uint32_t len);
+void buffer_reset(void);
+
+#endif
diff --git a/vcs-svn/line_buffer.txt b/vcs-svn/line_buffer.txt
new file mode 100644
index 0000000..8906fb1
--- /dev/null
+++ b/vcs-svn/line_buffer.txt
@@ -0,0 +1,58 @@
+line_buffer API
+===============
+
+The line_buffer library provides a convenient interface for
+mostly-line-oriented input.
+
+Each line is not permitted to exceed 10000 bytes.  The provided
+functions are not thread-safe or async-signal-safe, and like
+`fgets()`, they generally do not function correctly if interrupted
+by a signal without SA_RESTART set.
+
+Calling sequence
+----------------
+
+The calling program:
+
+ - specifies a file to read with `buffer_init`
+ - processes input with `buffer_read_line`, `buffer_read_string`,
+   `buffer_skip_bytes`, and `buffer_copy_bytes`
+ - closes the file with `buffer_deinit`, perhaps to start over and
+   read another file.
+
+Before exiting, the caller can use `buffer_reset` to deallocate
+resources for the benefit of profiling tools.
+
+Functions
+---------
+
+`buffer_init`::
+	Open the named file for input.  If filename is NULL,
+	start reading from stdin.  On failure, returns -1 (with
+	errno indicating the nature of the failure).
+
+`buffer_deinit`::
+	Stop reading from the current file (closing it unless
+	it was stdin).  Returns nonzero if `fclose` fails or
+	the error indicator was set.
+
+`buffer_read_line`::
+	Read a line and strip off the trailing newline.
+	On failure or end of file, returns NULL.
+
+`buffer_read_string`::
+	Read `len` characters of input or up to the end of the
+	file, whichever comes first.  Returns NULL on error.
+	Returns whatever characters were read (possibly "")
+	for end of file.
+
+`buffer_copy_bytes`::
+	Read `len` bytes of input and dump them to the standard output
+	stream.  Returns early for error or end of file.
+
+`buffer_skip_bytes`::
+	Discards `len` bytes from the input stream (stopping early
+	if necessary because of an error or eof).
+
+`buffer_reset`::
+	Deallocates non-static buffers.
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 07/10] Infrastructure to write revisions in fast-export format
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (5 preceding siblings ...)
  2010-08-09 22:39     ` [PATCH 06/10] Add stream helper library Jonathan Nieder
@ 2010-08-09 22:48     ` Jonathan Nieder
  2010-08-09 22:55     ` [PATCH 08/10] SVN dump parser Jonathan Nieder
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:48 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

[rr: squelched compiler warnings]
[jn: removed support for maintaining state on-disk, though we may
want to add it back later]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
No tests for this one, since the next few patches exercise it and this
is not a general-purpose API.  Relative to last round, the
`pool_commit` and `commit_commit` calls have been eliminated; unlike
the `dir_commit` et al calls, those were only meant for committing
state to disk, and the changing high-water mark was not being used.

 Makefile              |    5 +-
 vcs-svn/fast_export.c |   74 +++++++++++
 vcs-svn/fast_export.h |   11 ++
 vcs-svn/repo_tree.c   |  328 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   26 ++++
 5 files changed, 442 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h

diff --git a/Makefile b/Makefile
index a76cce5..b873399 100644
--- a/Makefile
+++ b/Makefile
@@ -1744,7 +1744,8 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o
+VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1869,7 +1870,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
-	vcs-svn/line_buffer.h
+	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
new file mode 100644
index 0000000..3a6156f
--- /dev/null
+++ b/vcs-svn/fast_export.c
@@ -0,0 +1,74 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "fast_export.h"
+#include "line_buffer.h"
+#include "repo_tree.h"
+#include "string_pool.h"
+
+#define MAX_GITSVN_LINE_LEN 4096
+
+static uint32_t first_commit_done;
+
+void fast_export_delete(uint32_t depth, uint32_t *path)
+{
+	putchar('D');
+	putchar(' ');
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+}
+
+void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
+			uint32_t mark)
+{
+	/* Mode must be 100644, 100755, 120000, or 160000. */
+	printf("M %06o :%d ", mode, mark);
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+}
+
+static char gitsvnline[MAX_GITSVN_LINE_LEN];
+void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+			uint32_t uuid, uint32_t url,
+			unsigned long timestamp)
+{
+	if (!log)
+		log = "";
+	if (~uuid && ~url) {
+		snprintf(gitsvnline, MAX_GITSVN_LINE_LEN, "\n\ngit-svn-id: %s@%d %s\n",
+				 pool_fetch(url), revision, pool_fetch(uuid));
+	} else {
+		*gitsvnline = '\0';
+	}
+	printf("commit refs/heads/master\n");
+	printf("committer %s <%s@%s> %ld +0000\n",
+		   ~author ? pool_fetch(author) : "nobody",
+		   ~author ? pool_fetch(author) : "nobody",
+		   ~uuid ? pool_fetch(uuid) : "local", timestamp);
+	printf("data %zd\n%s%s\n",
+		   strlen(log) + strlen(gitsvnline), log, gitsvnline);
+	if (!first_commit_done) {
+		if (revision > 1)
+			printf("from refs/heads/master^0\n");
+		first_commit_done = 1;
+	}
+	repo_diff(revision - 1, revision);
+	fputc('\n', stdout);
+
+	printf("progress Imported commit %d.\n\n", revision);
+}
+
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)
+{
+	if (mode == REPO_MODE_LNK) {
+		/* svn symlink blobs start with "link " */
+		buffer_skip_bytes(5);
+		len -= 5;
+	}
+	printf("blob\nmark :%d\ndata %d\n", mark, len);
+	buffer_copy_bytes(len);
+	fputc('\n', stdout);
+}
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
new file mode 100644
index 0000000..2aaaea5
--- /dev/null
+++ b/vcs-svn/fast_export.h
@@ -0,0 +1,11 @@
+#ifndef FAST_EXPORT_H_
+#define FAST_EXPORT_H_
+
+void fast_export_delete(uint32_t depth, uint32_t *path);
+void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
+			uint32_t mark);
+void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+			uint32_t uuid, uint32_t url, unsigned long timestamp);
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len);
+
+#endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
new file mode 100644
index 0000000..ba31e72
--- /dev/null
+++ b/vcs-svn/repo_tree.c
@@ -0,0 +1,328 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+
+#include "string_pool.h"
+#include "repo_tree.h"
+#include "obj_pool.h"
+#include "fast_export.h"
+
+#include "trp.h"
+
+struct repo_dirent {
+	uint32_t name_offset;
+	struct trp_node children;
+	uint32_t mode;
+	uint32_t content_offset;
+};
+
+struct repo_dir {
+	struct trp_root entries;
+};
+
+struct repo_commit {
+	uint32_t root_dir_offset;
+};
+
+/* Memory pools for commit, dir and dirent */
+obj_pool_gen(commit, struct repo_commit, 4096)
+obj_pool_gen(dir, struct repo_dir, 4096)
+obj_pool_gen(dirent, struct repo_dirent, 4096)
+
+static uint32_t active_commit;
+static uint32_t mark;
+
+static int repo_dirent_name_cmp(const void *a, const void *b);
+
+/* Treap for directory entries */
+trp_gen(static, dirent_, struct repo_dirent, children, dirent, repo_dirent_name_cmp);
+
+uint32_t next_blob_mark(void)
+{
+	return mark++;
+}
+
+static struct repo_dir *repo_commit_root_dir(struct repo_commit *commit)
+{
+	return dir_pointer(commit->root_dir_offset);
+}
+
+static struct repo_dirent *repo_first_dirent(struct repo_dir *dir)
+{
+	return dirent_first(&dir->entries);
+}
+
+static int repo_dirent_name_cmp(const void *a, const void *b)
+{
+	const struct repo_dirent *dirent1 = a, *dirent2 = b;
+	uint32_t a_offset = dirent1->name_offset;
+	uint32_t b_offset = dirent2->name_offset;
+	return (a_offset > b_offset) - (a_offset < b_offset);
+}
+
+static int repo_dirent_is_dir(struct repo_dirent *dirent)
+{
+	return dirent != NULL && dirent->mode == REPO_MODE_DIR;
+}
+
+static struct repo_dir *repo_dir_from_dirent(struct repo_dirent *dirent)
+{
+	if (!repo_dirent_is_dir(dirent))
+		return NULL;
+	return dir_pointer(dirent->content_offset);
+}
+
+static struct repo_dir *repo_clone_dir(struct repo_dir *orig_dir)
+{
+	uint32_t orig_o, new_o;
+	orig_o = dir_offset(orig_dir);
+	if (orig_o >= dir_pool.committed)
+		return orig_dir;
+	new_o = dir_alloc(1);
+	orig_dir = dir_pointer(orig_o);
+	*dir_pointer(new_o) = *orig_dir;
+	return dir_pointer(new_o);
+}
+
+static struct repo_dirent *repo_read_dirent(uint32_t revision, uint32_t *path)
+{
+	uint32_t name = 0;
+	struct repo_dirent *key = dirent_pointer(dirent_alloc(1));
+	struct repo_dir *dir = NULL;
+	struct repo_dirent *dirent = NULL;
+	dir = repo_commit_root_dir(commit_pointer(revision));
+	while (~(name = *path++)) {
+		key->name_offset = name;
+		dirent = dirent_search(&dir->entries, key);
+		if (dirent == NULL || !repo_dirent_is_dir(dirent))
+			break;
+		dir = repo_dir_from_dirent(dirent);
+	}
+	dirent_free(1);
+	return dirent;
+}
+
+static void repo_write_dirent(uint32_t *path, uint32_t mode,
+			      uint32_t content_offset, uint32_t del)
+{
+	uint32_t name, revision, dir_o = ~0, parent_dir_o = ~0;
+	struct repo_dir *dir;
+	struct repo_dirent *key;
+	struct repo_dirent *dirent = NULL;
+	revision = active_commit;
+	dir = repo_commit_root_dir(commit_pointer(revision));
+	dir = repo_clone_dir(dir);
+	commit_pointer(revision)->root_dir_offset = dir_offset(dir);
+	while (~(name = *path++)) {
+		parent_dir_o = dir_offset(dir);
+
+		key = dirent_pointer(dirent_alloc(1));
+		key->name_offset = name;
+
+		dirent = dirent_search(&dir->entries, key);
+		if (dirent == NULL)
+			dirent = key;
+		else
+			dirent_free(1);
+
+		if (dirent == key) {
+			dirent->mode = REPO_MODE_DIR;
+			dirent->content_offset = 0;
+			dirent_insert(&dir->entries, dirent);
+		}
+
+		if (dirent_offset(dirent) < dirent_pool.committed) {
+			dir_o = repo_dirent_is_dir(dirent) ?
+					dirent->content_offset : ~0;
+			dirent_remove(&dir->entries, dirent);
+			dirent = dirent_pointer(dirent_alloc(1));
+			dirent->name_offset = name;
+			dirent->mode = REPO_MODE_DIR;
+			dirent->content_offset = dir_o;
+			dirent_insert(&dir->entries, dirent);
+		}
+
+		dir = repo_dir_from_dirent(dirent);
+		dir = repo_clone_dir(dir);
+		dirent->content_offset = dir_offset(dir);
+	}
+	if (dirent == NULL)
+		return;
+	dirent->mode = mode;
+	dirent->content_offset = content_offset;
+	if (del && ~parent_dir_o)
+		dirent_remove(&dir_pointer(parent_dir_o)->entries, dirent);
+}
+
+uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
+{
+	uint32_t mode = 0, content_offset = 0;
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(revision, src);
+	if (src_dirent != NULL) {
+		mode = src_dirent->mode;
+		content_offset = src_dirent->content_offset;
+		repo_write_dirent(dst, mode, content_offset, 0);
+	}
+	return mode;
+}
+
+void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
+{
+	repo_write_dirent(path, mode, blob_mark, 0);
+}
+
+uint32_t repo_replace(uint32_t *path, uint32_t blob_mark)
+{
+	uint32_t mode = 0;
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(active_commit, path);
+	if (src_dirent != NULL) {
+		mode = src_dirent->mode;
+		repo_write_dirent(path, mode, blob_mark, 0);
+	}
+	return mode;
+}
+
+void repo_modify(uint32_t *path, uint32_t mode, uint32_t blob_mark)
+{
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(active_commit, path);
+	if (src_dirent != NULL && blob_mark == 0)
+		blob_mark = src_dirent->content_offset;
+	repo_write_dirent(path, mode, blob_mark, 0);
+}
+
+void repo_delete(uint32_t *path)
+{
+	repo_write_dirent(path, 0, 0, 1);
+}
+
+static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir);
+
+static void repo_git_add(uint32_t depth, uint32_t *path, struct repo_dirent *dirent)
+{
+	if (repo_dirent_is_dir(dirent))
+		repo_git_add_r(depth, path, repo_dir_from_dirent(dirent));
+	else
+		fast_export_modify(depth, path,
+				   dirent->mode, dirent->content_offset);
+}
+
+static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir)
+{
+	struct repo_dirent *de = repo_first_dirent(dir);
+	while (de) {
+		path[depth] = de->name_offset;
+		repo_git_add(depth + 1, path, de);
+		de = dirent_next(&dir->entries, de);
+	}
+}
+
+static void repo_diff_r(uint32_t depth, uint32_t *path, struct repo_dir *dir1,
+			struct repo_dir *dir2)
+{
+	struct repo_dirent *de1, *de2;
+	de1 = repo_first_dirent(dir1);
+	de2 = repo_first_dirent(dir2);
+
+	while (de1 && de2) {
+		if (de1->name_offset < de2->name_offset) {
+			path[depth] = de1->name_offset;
+			fast_export_delete(depth + 1, path);
+			de1 = dirent_next(&dir1->entries, de1);
+			continue;
+		}
+		if (de1->name_offset > de2->name_offset) {
+			path[depth] = de2->name_offset;
+			repo_git_add(depth + 1, path, de2);
+			de2 = dirent_next(&dir2->entries, de2);
+			continue;
+		}
+		path[depth] = de1->name_offset;
+
+		if (de1->mode == de2->mode &&
+		    de1->content_offset == de2->content_offset) {
+			; /* No change. */
+		} else if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
+			repo_diff_r(depth + 1, path,
+				    repo_dir_from_dirent(de1),
+				    repo_dir_from_dirent(de2));
+		} else if (!repo_dirent_is_dir(de1) && !repo_dirent_is_dir(de2)) {
+			repo_git_add(depth + 1, path, de2);
+		} else {
+			fast_export_delete(depth + 1, path);
+			repo_git_add(depth + 1, path, de2);
+		}
+		de1 = dirent_next(&dir1->entries, de1);
+		de2 = dirent_next(&dir2->entries, de2);
+	}
+	while (de1) {
+		path[depth] = de1->name_offset;
+		fast_export_delete(depth + 1, path);
+		de1 = dirent_next(&dir1->entries, de1);
+	}
+	while (de2) {
+		path[depth] = de2->name_offset;
+		repo_git_add(depth + 1, path, de2);
+		de2 = dirent_next(&dir2->entries, de2);
+	}
+}
+
+static uint32_t path_stack[REPO_MAX_PATH_DEPTH];
+
+void repo_diff(uint32_t r1, uint32_t r2)
+{
+	repo_diff_r(0,
+		    path_stack,
+		    repo_commit_root_dir(commit_pointer(r1)),
+		    repo_commit_root_dir(commit_pointer(r2)));
+}
+
+void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
+		 uint32_t url, unsigned long timestamp)
+{
+	fast_export_commit(revision, author, log, uuid, url, timestamp);
+	dirent_commit();
+	dir_commit();
+	active_commit = commit_alloc(1);
+	commit_pointer(active_commit)->root_dir_offset =
+		commit_pointer(active_commit - 1)->root_dir_offset;
+}
+
+static void mark_init(void)
+{
+	uint32_t i;
+	mark = 0;
+	for (i = 0; i < dirent_pool.size; i++)
+		if (!repo_dirent_is_dir(dirent_pointer(i)) &&
+		    dirent_pointer(i)->content_offset > mark)
+			mark = dirent_pointer(i)->content_offset;
+	mark++;
+}
+
+void repo_init() {
+	mark_init();
+	if (commit_pool.size == 0) {
+		/* Create empty tree for commit 0. */
+		commit_alloc(1);
+		commit_pointer(0)->root_dir_offset = dir_alloc(1);
+		dir_pointer(0)->entries.trp_root = ~0;
+		dir_commit();
+	}
+	/* Preallocate next commit, ready for changes. */
+	active_commit = commit_alloc(1);
+	commit_pointer(active_commit)->root_dir_offset =
+		commit_pointer(active_commit - 1)->root_dir_offset;
+}
+
+void repo_reset(void)
+{
+	pool_reset();
+	commit_reset();
+	dir_reset();
+	dirent_reset();
+}
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
new file mode 100644
index 0000000..5476175
--- /dev/null
+++ b/vcs-svn/repo_tree.h
@@ -0,0 +1,26 @@
+#ifndef REPO_TREE_H_
+#define REPO_TREE_H_
+
+#include "git-compat-util.h"
+
+#define REPO_MODE_DIR 0040000
+#define REPO_MODE_BLB 0100644
+#define REPO_MODE_EXE 0100755
+#define REPO_MODE_LNK 0120000
+
+#define REPO_MAX_PATH_LEN 4096
+#define REPO_MAX_PATH_DEPTH 1000
+
+uint32_t next_blob_mark(void);
+uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
+void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+uint32_t repo_replace(uint32_t *path, uint32_t blob_mark);
+void repo_modify(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+void repo_delete(uint32_t *path);
+void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
+		 uint32_t url, long unsigned timestamp);
+void repo_diff(uint32_t r1, uint32_t r2);
+void repo_init(void);
+void repo_reset(void);
+
+#endif
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 08/10] SVN dump parser
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (6 preceding siblings ...)
  2010-08-09 22:48     ` [PATCH 07/10] Infrastructure to write revisions in fast-export format Jonathan Nieder
@ 2010-08-09 22:55     ` Jonathan Nieder
  2010-08-12 17:22       ` Junio C Hamano
  2010-08-09 22:55     ` PATCH 09/10] Update svn-fe manual Jonathan Nieder
                       ` (2 subsequent siblings)
  10 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

[rr: allow input from files other than stdin]
[jn: with test, more error reporting]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
New test.  It is slow; work by svn gurus to speed this up would
be nice.  The test program is very similar to svn-fe from contrib,
except it exercises Ram’s change to read from a file other than
stdin.

 .gitignore              |    1 +
 Makefile                |    8 +-
 contrib/svn-fe/svn-fe.c |    1 +
 t/t9010-svn-fe.sh       |   32 +++++
 test-svn-fe.c           |   18 +++
 vcs-svn/LICENSE         |    4 +
 vcs-svn/svndump.c       |  302 +++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h       |    9 ++
 8 files changed, 373 insertions(+), 2 deletions(-)
 create mode 100644 t/t9010-svn-fe.sh
 create mode 100644 test-svn-fe.c
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h

diff --git a/.gitignore b/.gitignore
index 8c0512e..258723f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -175,6 +175,7 @@
 /test-sha1
 /test-sigchain
 /test-string-pool
+/test-svn-fe
 /test-treap
 /common-cmds.h
 *.tar.gz
diff --git a/Makefile b/Makefile
index b873399..6228f66 100644
--- a/Makefile
+++ b/Makefile
@@ -417,6 +417,7 @@ TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
 TEST_PROGRAMS_NEED_X += test-string-pool
+TEST_PROGRAMS_NEED_X += test-svn-fe
 TEST_PROGRAMS_NEED_X += test-treap
 TEST_PROGRAMS_NEED_X += test-index-version
 
@@ -1745,7 +1746,7 @@ endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
-	vcs-svn/repo_tree.o vcs-svn/fast_export.o
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1870,7 +1871,8 @@ xdiff-interface.o $(XDIFF_OBJS): \
 
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
-	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h
+	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
+	vcs-svn/svndump.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
@@ -2025,6 +2027,8 @@ test-parse-options$X: parse-options.o
 
 test-string-pool$X: vcs-svn/lib.a
 
+test-svn-fe$X: vcs-svn/lib.a
+
 .PRECIOUS: $(TEST_OBJS)
 
 test-%$X: test-%.o $(GITLIBS)
diff --git a/contrib/svn-fe/svn-fe.c b/contrib/svn-fe/svn-fe.c
index e9b9ba4..a2677b0 100644
--- a/contrib/svn-fe/svn-fe.c
+++ b/contrib/svn-fe/svn-fe.c
@@ -10,6 +10,7 @@ int main(int argc, char **argv)
 {
 	svndump_init(NULL);
 	svndump_read((argc > 1) ? argv[1] : NULL);
+	svndump_deinit();
 	svndump_reset();
 	return 0;
 }
diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
new file mode 100644
index 0000000..bf9bbd6
--- /dev/null
+++ b/t/t9010-svn-fe.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='check svn dumpfile importer'
+
+. ./lib-git-svn.sh
+
+test_dump() {
+	label=$1
+	dump=$2
+	test_expect_success "$dump" '
+		svnadmin create "$label-svn" &&
+		svnadmin load "$label-svn" < "$TEST_DIRECTORY/$dump" &&
+		svn_cmd export "file://$(pwd)/$label-svn" "$label-svnco" &&
+		git init "$label-git" &&
+		test-svn-fe "$TEST_DIRECTORY/$dump" >"$label.fe" &&
+		(
+			cd "$label-git" &&
+			git fast-import < ../"$label.fe"
+		) &&
+		(
+			cd "$label-svnco" &&
+			git init &&
+			git add . &&
+			git fetch "../$label-git" master &&
+			git diff --exit-code FETCH_HEAD
+		)
+	'
+}
+
+test_dump simple t9111/svnsync.dump
+
+test_done
diff --git a/test-svn-fe.c b/test-svn-fe.c
new file mode 100644
index 0000000..616a474
--- /dev/null
+++ b/test-svn-fe.c
@@ -0,0 +1,18 @@
+/*
+ * test-svn-fe: Code to exercise the svn import lib
+ */
+
+#include "git-compat-util.h"
+#include "vcs-svn/svndump.h"
+
+int main(int argc, char *argv[])
+{
+	if (argc != 2)
+		usage("test-svn-fe <file>");
+	svndump_init(argv[1]);
+	svndump_read(NULL);
+	svndump_deinit();
+	svndump_reset();
+	return 0;
+}
+
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index a3d384c..0a5e3c4 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -4,6 +4,10 @@ All rights reserved.
 Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
 All rights reserved.
 
+Copyright (C) 2005 Stefan Hegny, hydrografix Consulting GmbH,
+Frankfurt/Main, Germany
+and others, see http://svn2cc.sarovar.org
+
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
new file mode 100644
index 0000000..630eeb5
--- /dev/null
+++ b/vcs-svn/svndump.c
@@ -0,0 +1,302 @@
+/*
+ * Parse and rearrange a svnadmin dump.
+ * Create the dump with:
+ * svnadmin dump --incremental -r<startrev>:<endrev> <repository> >outfile
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "cache.h"
+#include "repo_tree.h"
+#include "fast_export.h"
+#include "line_buffer.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+#define NODEACT_REPLACE 4
+#define NODEACT_DELETE 3
+#define NODEACT_ADD 2
+#define NODEACT_CHANGE 1
+#define NODEACT_UNKNOWN 0
+
+#define DUMP_CTX 0
+#define REV_CTX  1
+#define NODE_CTX 2
+
+#define LENGTH_UNKNOWN (~0)
+#define DATE_RFC2822_LEN 31
+
+/* Create memory pool for log messages */
+obj_pool_gen(log, char, 4096)
+
+static char* log_copy(uint32_t length, char *log)
+{
+	char *buffer;
+	log_free(log_pool.size);
+	buffer = log_pointer(log_alloc(length));
+	strncpy(buffer, log, length);
+	return buffer;
+}
+
+static struct {
+	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
+	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
+} node_ctx;
+
+static struct {
+	uint32_t revision, author;
+	unsigned long timestamp;
+	char *log;
+} rev_ctx;
+
+static struct {
+	uint32_t uuid, url;
+} dump_ctx;
+
+static struct {
+	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
+		revision_number, node_path, node_kind, node_action,
+		node_copyfrom_path, node_copyfrom_rev, text_content_length,
+		prop_content_length, content_length;
+} keys;
+
+static void reset_node_ctx(char *fname)
+{
+	node_ctx.type = 0;
+	node_ctx.action = NODEACT_UNKNOWN;
+	node_ctx.propLength = LENGTH_UNKNOWN;
+	node_ctx.textLength = LENGTH_UNKNOWN;
+	node_ctx.src[0] = ~0;
+	node_ctx.srcRev = 0;
+	node_ctx.srcMode = 0;
+	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
+	node_ctx.mark = 0;
+}
+
+static void reset_rev_ctx(uint32_t revision)
+{
+	rev_ctx.revision = revision;
+	rev_ctx.timestamp = 0;
+	rev_ctx.log = NULL;
+	rev_ctx.author = ~0;
+}
+
+static void reset_dump_ctx(uint32_t url)
+{
+	dump_ctx.url = url;
+	dump_ctx.uuid = ~0;
+}
+
+static void init_keys(void)
+{
+	keys.svn_log = pool_intern("svn:log");
+	keys.svn_author = pool_intern("svn:author");
+	keys.svn_date = pool_intern("svn:date");
+	keys.svn_executable = pool_intern("svn:executable");
+	keys.svn_special = pool_intern("svn:special");
+	keys.uuid = pool_intern("UUID");
+	keys.revision_number = pool_intern("Revision-number");
+	keys.node_path = pool_intern("Node-path");
+	keys.node_kind = pool_intern("Node-kind");
+	keys.node_action = pool_intern("Node-action");
+	keys.node_copyfrom_path = pool_intern("Node-copyfrom-path");
+	keys.node_copyfrom_rev = pool_intern("Node-copyfrom-rev");
+	keys.text_content_length = pool_intern("Text-content-length");
+	keys.prop_content_length = pool_intern("Prop-content-length");
+	keys.content_length = pool_intern("Content-length");
+}
+
+static void read_props(void)
+{
+	uint32_t len;
+	uint32_t key = ~0;
+	char *val = NULL;
+	char *t;
+	while ((t = buffer_read_line()) && strcmp(t, "PROPS-END")) {
+		if (!strncmp(t, "K ", 2)) {
+			len = atoi(&t[2]);
+			key = pool_intern(buffer_read_string(len));
+			buffer_read_line();
+		} else if (!strncmp(t, "V ", 2)) {
+			len = atoi(&t[2]);
+			val = buffer_read_string(len);
+			if (key == keys.svn_log) {
+				/* Value length excludes terminating nul. */
+				rev_ctx.log = log_copy(len + 1, val);
+			} else if (key == keys.svn_author) {
+				rev_ctx.author = pool_intern(val);
+			} else if (key == keys.svn_date) {
+				if (parse_date_basic(val, &rev_ctx.timestamp, NULL))
+					fprintf(stderr, "Invalid timestamp: %s\n", val);
+			} else if (key == keys.svn_executable) {
+				node_ctx.type = REPO_MODE_EXE;
+			} else if (key == keys.svn_special) {
+				node_ctx.type = REPO_MODE_LNK;
+			}
+			key = ~0;
+			buffer_read_line();
+		}
+	}
+}
+
+static void handle_node(void)
+{
+	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength)
+		read_props();
+
+	if (node_ctx.srcRev)
+		node_ctx.srcMode = repo_copy(node_ctx.srcRev, node_ctx.src, node_ctx.dst);
+
+	if (node_ctx.textLength != LENGTH_UNKNOWN &&
+	    node_ctx.type != REPO_MODE_DIR)
+		node_ctx.mark = next_blob_mark();
+
+	if (node_ctx.action == NODEACT_DELETE) {
+		repo_delete(node_ctx.dst);
+	} else if (node_ctx.action == NODEACT_CHANGE ||
+			   node_ctx.action == NODEACT_REPLACE) {
+		if (node_ctx.action == NODEACT_REPLACE &&
+		    node_ctx.type == REPO_MODE_DIR)
+			repo_replace(node_ctx.dst, node_ctx.mark);
+		else if (node_ctx.propLength != LENGTH_UNKNOWN)
+			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+		else if (node_ctx.textLength != LENGTH_UNKNOWN)
+			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
+	} else if (node_ctx.action == NODEACT_ADD) {
+		if (node_ctx.srcRev && node_ctx.propLength != LENGTH_UNKNOWN)
+			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+		else if (node_ctx.srcRev && node_ctx.textLength != LENGTH_UNKNOWN)
+			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
+		else if ((node_ctx.type == REPO_MODE_DIR && !node_ctx.srcRev) ||
+			 node_ctx.textLength != LENGTH_UNKNOWN)
+			repo_add(node_ctx.dst, node_ctx.type, node_ctx.mark);
+	}
+
+	if (node_ctx.propLength == LENGTH_UNKNOWN && node_ctx.srcMode)
+		node_ctx.type = node_ctx.srcMode;
+
+	if (node_ctx.mark)
+		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength);
+	else if (node_ctx.textLength != LENGTH_UNKNOWN)
+		buffer_skip_bytes(node_ctx.textLength);
+}
+
+static void handle_revision(void)
+{
+	if (rev_ctx.revision)
+		repo_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
+			dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
+}
+
+void svndump_read(const char *url)
+{
+	char *val;
+	char *t;
+	uint32_t active_ctx = DUMP_CTX;
+	uint32_t len;
+	uint32_t key;
+
+	reset_dump_ctx(pool_intern(url));
+	while ((t = buffer_read_line())) {
+		val = strstr(t, ": ");
+		if (!val)
+			continue;
+		*val++ = '\0';
+		*val++ = '\0';
+		key = pool_intern(t);
+
+		if (key == keys.uuid) {
+			dump_ctx.uuid = pool_intern(val);
+		} else if (key == keys.revision_number) {
+			if (active_ctx == NODE_CTX)
+				handle_node();
+			if (active_ctx != DUMP_CTX)
+				handle_revision();
+			active_ctx = REV_CTX;
+			reset_rev_ctx(atoi(val));
+		} else if (key == keys.node_path) {
+			if (active_ctx == NODE_CTX)
+				handle_node();
+			active_ctx = NODE_CTX;
+			reset_node_ctx(val);
+		} else if (key == keys.node_kind) {
+			if (!strcmp(val, "dir"))
+				node_ctx.type = REPO_MODE_DIR;
+			else if (!strcmp(val, "file"))
+				node_ctx.type = REPO_MODE_BLB;
+			else
+				fprintf(stderr, "Unknown node-kind: %s\n", val);
+		} else if (key == keys.node_action) {
+			if (!strcmp(val, "delete")) {
+				node_ctx.action = NODEACT_DELETE;
+			} else if (!strcmp(val, "add")) {
+				node_ctx.action = NODEACT_ADD;
+			} else if (!strcmp(val, "change")) {
+				node_ctx.action = NODEACT_CHANGE;
+			} else if (!strcmp(val, "replace")) {
+				node_ctx.action = NODEACT_REPLACE;
+			} else {
+				fprintf(stderr, "Unknown node-action: %s\n", val);
+				node_ctx.action = NODEACT_UNKNOWN;
+			}
+		} else if (key == keys.node_copyfrom_path) {
+			pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.src, "/", val);
+		} else if (key == keys.node_copyfrom_rev) {
+			node_ctx.srcRev = atoi(val);
+		} else if (key == keys.text_content_length) {
+			node_ctx.textLength = atoi(val);
+		} else if (key == keys.prop_content_length) {
+			node_ctx.propLength = atoi(val);
+		} else if (key == keys.content_length) {
+			len = atoi(val);
+			buffer_read_line();
+			if (active_ctx == REV_CTX) {
+				read_props();
+			} else if (active_ctx == NODE_CTX) {
+				handle_node();
+				active_ctx = REV_CTX;
+			} else {
+				fprintf(stderr, "Unexpected content length header: %d\n", len);
+				buffer_skip_bytes(len);
+			}
+		}
+	}
+	if (active_ctx == NODE_CTX)
+		handle_node();
+	if (active_ctx != DUMP_CTX)
+		handle_revision();
+}
+
+void svndump_init(const char *filename)
+{
+	buffer_init(filename);
+	repo_init();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+	init_keys();
+}
+
+void svndump_deinit(void)
+{
+	log_reset();
+	repo_reset();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+	if (buffer_deinit())
+		fprintf(stderr, "Input error\n");
+	if (ferror(stdout))
+		fprintf(stderr, "Output error\n");
+}
+
+void svndump_reset(void)
+{
+	log_reset();
+	buffer_reset();
+	repo_reset();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+}
diff --git a/vcs-svn/svndump.h b/vcs-svn/svndump.h
new file mode 100644
index 0000000..93c412f
--- /dev/null
+++ b/vcs-svn/svndump.h
@@ -0,0 +1,9 @@
+#ifndef SVNDUMP_H_
+#define SVNDUMP_H_
+
+void svndump_init(const char *filename);
+void svndump_read(const char *url);
+void svndump_deinit(void);
+void svndump_reset(void);
+
+#endif
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* PATCH 09/10] Update svn-fe manual
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (7 preceding siblings ...)
  2010-08-09 22:55     ` [PATCH 08/10] SVN dump parser Jonathan Nieder
@ 2010-08-09 22:55     ` Jonathan Nieder
  2010-08-09 22:58     ` [PATCH 10/10] svn-fe manual: Clarify warning about deltas in dump files Jonathan Nieder
  2010-08-10 12:53     ` [PATCH 0/10] rr/svn-export reroll Ramkumar Ramachandra
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

The svn-fe example does not litter the working directory with
.bin files any more (hoorah!).

The permissive error handling implies a known bug.  We should
be flagging iffy input and, even if we continue, reporting it
on exit.

Cc: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/svn-fe/svn-fe.txt |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index de30f83..fb0ee56 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -43,11 +43,9 @@ user <user@UUID>
 as committer, where 'user' is the value of the `svn:author` property
 and 'UUID' the repository's identifier.
 
-To support incremental imports, 'svn-fe' will put a `git-svn-id`
-line at the end of each commit log message if passed an url on the
-command line.  This line has the form `git-svn-id: URL@REVNO UUID`.
-
-Empty directories and unknown properties are silently discarded.
+To support incremental imports, 'svn-fe' puts a `git-svn-id` line at
+the end of each commit log message if passed an url on the command
+line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
@@ -56,9 +54,9 @@ may be useful for this purpose.
 
 BUGS
 ----
-Litters the current working directory with .bin files for
-persistence. Will be fixed when the svn-fe infrastructure is aware of
-a Git working directory.
+Empty directories and unknown properties are silently discarded.
+
+The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 10/10] svn-fe manual: Clarify warning about deltas in dump files
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (8 preceding siblings ...)
  2010-08-09 22:55     ` PATCH 09/10] Update svn-fe manual Jonathan Nieder
@ 2010-08-09 22:58     ` Jonathan Nieder
  2010-08-10 12:53     ` [PATCH 0/10] rr/svn-export reroll Ramkumar Ramachandra
  10 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-09 22:58 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Those in the know would notice that dump file format version 2
means "svnadmin dump --no-deltas", but for the rest of us, an
explicit reminder is useful.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That’s the end of the series.  Thanks for reading.

On the horizon, as you may have guessed, are changes to use
dumpfile format v3.  Doing so sanely requires two-way communication
with fast-import, I think (as discussed).  Ram has already put
together a prototype delta applier, so it seems to be mostly a matter
of plumbing now.

 contrib/svn-fe/svn-fe.txt |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index fb0ee56..35f84bd 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -12,7 +12,7 @@ svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
 DESCRIPTION
 -----------
 
-Converts a Subversion dumpfile (version: 2) into input suitable for
+Converts a Subversion dumpfile into input suitable for
 git-fast-import(1) and similar importers. REPO is a path to a
 Subversion repository mirrored on the local disk. Remote Subversion
 repositories can be mirrored on local disk using the `svnsync`
@@ -25,6 +25,9 @@ Subversion's repository dump format is documented in full in
 Files in this format can be generated using the 'svnadmin dump' or
 'svk admin dump' command.
 
+Dumps produced with 'svnadmin dump --deltas' (dumpfile format v3)
+are not supported.
+
 OUTPUT FORMAT
 -------------
 The fast-import format is documented by the git-fast-import(1)
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/10] rr/svn-export reroll
  2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
                       ` (9 preceding siblings ...)
  2010-08-09 22:58     ` [PATCH 10/10] svn-fe manual: Clarify warning about deltas in dump files Jonathan Nieder
@ 2010-08-10 12:53     ` Ramkumar Ramachandra
  2010-08-11  1:53       ` Jonathan Nieder
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
  10 siblings, 2 replies; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-08-10 12:53 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Hi Jonathan,

Jonathan Nieder writes:
> svn-fe has some serious changes on the horizon.  As a preparation,
> let’s round up what we have now.
> 
> The most controversial change is probably the new svn-fe test, which
> takes about 15 seconds (for the “svnadmin load”, not the svn-fe
> step :)).  It is in the t9* series, so hopefully that will not
> dissuade people from running the earlier tests.

I'll comment on this separately.

> The main highlight in the changes is a new
> 
> 	Input error
> 
> to stderr if a system call failed in reading in the dump file.
> It still returns status 0 in this and other error situations,
> though.

I'll comment on this separately.

> Based on maint (for no good reason; that’s just where I tried it).
> Intended to replace rr/svn-export in pu (only if Ram likes it, of
> course).

Thanks for re-rolling (again)! You've also added a note to the commit
messages briefly explaining what each contributor has done. I'd
expected some incremental patches instead of a full re-roll, but
whatever works is good :)

> David Barr (5):
>   Add memory pool library
>   Add string-specific memory pool
>   Add stream helper library
>   Infrastructure to write revisions in fast-export format
>   SVN dump parser
>
> Jason Evans (1):
>   Add treap implementation
>
> Jonathan Nieder (4):
>   Introduce vcs-svn lib

All these are good :)

>   Export parse_date_basic() to convert a date string to timestamp

Wasn't this ejected from this series and made a separate patch?

>   Update svn-fe manual

Removed the BUG since we've turned off persistence.

>   svn-fe manual: Clarify warning about deltas in dumpfiles

We have to fix this real soon- I'm waiting for the weekend so I get
some solid chunks of hacking time.

-- Ram

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/10] rr/svn-export reroll
  2010-08-10 12:53     ` [PATCH 0/10] rr/svn-export reroll Ramkumar Ramachandra
@ 2010-08-11  1:53       ` Jonathan Nieder
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
  1 sibling, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-11  1:53 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra wrote:

>                                                             I'd
> expected some incremental patches instead of a full re-roll, but
> whatever works is good :)

Yeah, I think after this series the topic has stabilized enough
to build on. :)

> All these are good :)

Thanks for checking.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 05/10] Add string-specific memory pool
  2010-08-09 22:34     ` [PATCH 05/10] Add string-specific memory pool Jonathan Nieder
@ 2010-08-12 17:22       ` Junio C Hamano
  2010-08-12 21:30         ` Jonathan Nieder
  0 siblings, 1 reply; 79+ messages in thread
From: Junio C Hamano @ 2010-08-12 17:22 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

Jonathan Nieder <jrnieder@gmail.com> writes:

> diff --git a/Makefile b/Makefile
> index e7c33ec..24103c9 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -415,6 +415,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
>  TEST_PROGRAMS_NEED_X += test-run-command
>  TEST_PROGRAMS_NEED_X += test-sha1
>  TEST_PROGRAMS_NEED_X += test-sigchain
> +TEST_PROGRAMS_NEED_X += test-string-pool
>  TEST_PROGRAMS_NEED_X += test-treap
>  TEST_PROGRAMS_NEED_X += test-index-version

Does your Makefile do the right thing to vcs-svn/*.[oa] upon "make clean"?

> diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
> new file mode 100644
> index 0000000..550f0e5
> --- /dev/null
> +++ b/vcs-svn/string_pool.c
> @@ -0,0 +1,102 @@
> ...
> +uint32_t pool_intern(const char *key)
> +{
> +	/* Canonicalize key */
> +	struct node *match = NULL;
> +	uint32_t key_len;
> +	if (key == NULL)
> +		return ~0;
> +	key_len = strlen(key) + 1;
> +	struct node *node = node_pointer(node_alloc(1));

Please fix decl-after-stmt here.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 04/10] Add treap implementation
  2010-08-09 22:17     ` [PATCH 04/10] Add treap implementation Jonathan Nieder
@ 2010-08-12 17:22       ` Junio C Hamano
  2010-08-12 22:02         ` Jonathan Nieder
  2010-08-12 22:11         ` Jonathan Nieder
  0 siblings, 2 replies; 79+ messages in thread
From: Junio C Hamano @ 2010-08-12 17:22 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier, Junio C Hamano

Jonathan Nieder <jrnieder@gmail.com> writes:

> +/* Left accessors. */
> +#define trp_left_get(a_base, a_field, a_node) \
> +	(trpn_pointer(a_base, a_node)->a_field.trpn_left)
> +#define trp_left_set(a_base, a_field, a_node, a_left) \
> +	do { \
> +		trpn_modify(a_base, a_node); \
> +		trp_left_get(a_base, a_field, a_node) = (a_left); \
> +	} while(0)

Need SP after "while" (there are other occurrences).
> +a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
> +{ \
> +	int cmp; \
> +	uint32_t ret = treap->trp_root; \
> +	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \

SP after "," (same for nsearch)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 08/10] SVN dump parser
  2010-08-09 22:55     ` [PATCH 08/10] SVN dump parser Jonathan Nieder
@ 2010-08-12 17:22       ` Junio C Hamano
  0 siblings, 0 replies; 79+ messages in thread
From: Junio C Hamano @ 2010-08-12 17:22 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

Jonathan Nieder <jrnieder@gmail.com> writes:

> diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
> new file mode 100644
> index 0000000..630eeb5
> --- /dev/null
> +++ b/vcs-svn/svndump.c
> @@ -0,0 +1,302 @@
> ...
> +static char* log_copy(uint32_t length, char *log)

Style: static char *log_copy(...)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 05/10] Add string-specific memory pool
  2010-08-12 17:22       ` Junio C Hamano
@ 2010-08-12 21:30         ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-12 21:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> diff --git a/Makefile b/Makefile
>> index e7c33ec..24103c9 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -415,6 +415,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
>>  TEST_PROGRAMS_NEED_X += test-run-command
>>  TEST_PROGRAMS_NEED_X += test-sha1
>>  TEST_PROGRAMS_NEED_X += test-sigchain
>> +TEST_PROGRAMS_NEED_X += test-string-pool
>>  TEST_PROGRAMS_NEED_X += test-treap
>>  TEST_PROGRAMS_NEED_X += test-index-version
>
> Does your Makefile do the right thing to vcs-svn/*.[oa] upon "make clean"?

Good catch.  Here’s a fixup for patch 2 (“Introduce vcs-svn lib”).

-- 8< --
Subject: vcs-svn: remove build artifacts on “make clean”

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
diff --git a/Makefile b/Makefile
index 2418820..24c4b3d 100644
--- a/Makefile
+++ b/Makefile
@@ -2184,8 +2184,8 @@ distclean: clean
 	$(RM) configure
 
 clean:
-	$(RM) *.o block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o \
-		builtin/*.o $(LIB_FILE) $(XDIFF_LIB)
+	$(RM) *.o block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o vcs-svn/*.o \
+		builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) -r bin-wrappers
-- 

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 04/10] Add treap implementation
  2010-08-12 17:22       ` Junio C Hamano
@ 2010-08-12 22:02         ` Jonathan Nieder
  2010-08-12 22:11         ` Jonathan Nieder
  1 sibling, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-12 22:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

Junio C Hamano wrote:

> Need SP after "while" (there are other occurrences).

Good catch.  checkpatch also notices some long lines, but I think
that’s worth ignoring.

-- 8< --
Subject: treap: style fix

Missing spaces in while (0) and trpn_pointer(a, b).

Remove parentheses around return value.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/trp.h |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
index 1f5f51f..ee35c68 100644
--- a/vcs-svn/trp.h
+++ b/vcs-svn/trp.h
@@ -37,7 +37,7 @@ struct trp_root {
 			*trpn_pointer(a_base, a_offset) = \
 				*trpn_pointer(a_base, old_offset); \
 		} \
-	} while (0);
+	} while (0)
 
 /* Left accessors. */
 #define trp_left_get(a_base, a_field, a_node) \
@@ -46,7 +46,7 @@ struct trp_root {
 	do { \
 		trpn_modify(a_base, a_node); \
 		trp_left_get(a_base, a_field, a_node) = (a_left); \
-	} while(0)
+	} while (0)
 
 /* Right accessors. */
 #define trp_right_get(a_base, a_field, a_node) \
@@ -55,7 +55,7 @@ struct trp_root {
 	do { \
 		trpn_modify(a_base, a_node); \
 		trp_right_get(a_base, a_field, a_node) = (a_right); \
-	} while(0)
+	} while (0)
 
 /*
  * Fibonacci hash function.
@@ -72,7 +72,7 @@ struct trp_root {
 	do { \
 		trp_left_set(a_base, a_field, (a_node), ~0); \
 		trp_right_set(a_base, a_field, (a_node), ~0); \
-	} while(0)
+	} while (0)
 
 /* Internal utility macros. */
 #define trpn_first(a_base, a_field, a_root, r_node) \
@@ -90,7 +90,7 @@ struct trp_root {
 		trp_right_set(a_base, a_field, (a_node), \
 			trp_left_get(a_base, a_field, (r_node))); \
 		trp_left_set(a_base, a_field, (r_node), (a_node)); \
-	} while(0)
+	} while (0)
 
 #define trpn_rotate_right(a_base, a_field, a_node, r_node) \
 	do { \
@@ -98,7 +98,7 @@ struct trp_root {
 		trp_left_set(a_base, a_field, (a_node), \
 			trp_right_get(a_base, a_field, (r_node))); \
 		trp_right_set(a_base, a_field, (r_node), (a_node)); \
-	} while(0)
+	} while (0)
 
 #define trp_gen(a_attr, a_pre, a_type, a_field, a_base, a_cmp) \
 a_attr a_type MAYBE_UNUSED *a_pre##first(struct trp_root *treap) \
@@ -136,7 +136,7 @@ a_attr a_type MAYBE_UNUSED *a_pre##search(struct trp_root *treap, a_type *key) \
 { \
 	int cmp; \
 	uint32_t ret = treap->trp_root; \
-	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base, ret)))) { \
 		if (cmp < 0) { \
 			ret = trp_left_get(a_base, a_field, ret); \
 		} else { \
@@ -149,7 +149,7 @@ a_attr a_type MAYBE_UNUSED *a_pre##nsearch(struct trp_root *treap, a_type *key)
 { \
 	int cmp; \
 	uint32_t ret = treap->trp_root; \
-	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base,ret)))) { \
+	while (~ret && (cmp = (a_cmp)(key, trpn_pointer(a_base, ret)))) { \
 		if (cmp < 0) { \
 			if (!~trp_left_get(a_base, a_field, ret)) \
 				break; \
@@ -163,7 +163,7 @@ a_attr a_type MAYBE_UNUSED *a_pre##nsearch(struct trp_root *treap, a_type *key)
 a_attr uint32_t MAYBE_UNUSED a_pre##insert_recurse(uint32_t cur_node, uint32_t ins_node) \
 { \
 	if (cur_node == ~0) { \
-		return (ins_node); \
+		return ins_node; \
 	} else { \
 		uint32_t ret; \
 		int cmp = (a_cmp)(trpn_pointer(a_base, ins_node), \
@@ -185,7 +185,7 @@ a_attr uint32_t MAYBE_UNUSED a_pre##insert_recurse(uint32_t cur_node, uint32_t i
 			else \
 				ret = cur_node; \
 		} \
-		return (ret); \
+		return ret; \
 	} \
 } \
 a_attr void MAYBE_UNUSED a_pre##insert(struct trp_root *treap, a_type *node) \
@@ -204,27 +204,27 @@ a_attr uint32_t MAYBE_UNUSED a_pre##remove_recurse(uint32_t cur_node, uint32_t r
 		uint32_t right = trp_right_get(a_base, a_field, cur_node); \
 		if (left == ~0) { \
 			if (right == ~0) \
-				return (~0); \
+				return ~0; \
 		} else if (right == ~0 || trp_prio_get(left) < trp_prio_get(right)) { \
 			trpn_rotate_right(a_base, a_field, cur_node, ret); \
 			right = a_pre##remove_recurse(cur_node, rem_node); \
 			trp_right_set(a_base, a_field, ret, right); \
-			return (ret); \
+			return ret; \
 		} \
 		trpn_rotate_left(a_base, a_field, cur_node, ret); \
 		left = a_pre##remove_recurse(cur_node, rem_node); \
 		trp_left_set(a_base, a_field, ret, left); \
-		return (ret); \
+		return ret; \
 	} else if (cmp < 0) { \
 		uint32_t left = a_pre##remove_recurse( \
 			trp_left_get(a_base, a_field, cur_node), rem_node); \
 		trp_left_set(a_base, a_field, cur_node, left); \
-		return (cur_node); \
+		return cur_node; \
 	} else { \
 		uint32_t right = a_pre##remove_recurse( \
 			trp_right_get(a_base, a_field, cur_node), rem_node); \
 		trp_right_set(a_base, a_field, cur_node, right); \
-		return (cur_node); \
+		return cur_node; \
 	} \
 } \
 a_attr void MAYBE_UNUSED a_pre##remove(struct trp_root *treap, a_type *node) \
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 04/10] Add treap implementation
  2010-08-12 17:22       ` Junio C Hamano
  2010-08-12 22:02         ` Jonathan Nieder
@ 2010-08-12 22:11         ` Jonathan Nieder
  2010-08-12 22:44           ` Junio C Hamano
  1 sibling, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-08-12 22:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> +#define trp_left_set(a_base, a_field, a_node, a_left) \
>> +	do { \
>> +		trpn_modify(a_base, a_node); \
>> +		trp_left_get(a_base, a_field, a_node) = (a_left); \
>> +	} while(0)
>
> Need SP after "while" (there are other occurrences).

Here are a few more (but feel free to ignore them).

-- 8< --
Subject: Standardize do { ... } while (0) style

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 base85.c   |    6 +++---
 cache.h    |    2 +-
 diffcore.h |    8 ++++----
 http.h     |    4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/base85.c b/base85.c
index e459fee..781b575 100644
--- a/base85.c
+++ b/base85.c
@@ -7,9 +7,9 @@
 #define say1(a,b) fprintf(stderr, a, b)
 #define say2(a,b,c) fprintf(stderr, a, b, c)
 #else
-#define say(a) do {} while(0)
-#define say1(a,b) do {} while(0)
-#define say2(a,b,c) do {} while(0)
+#define say(a) do { /* nothing */ } while (0)
+#define say1(a,b) do { /* nothing */ } while (0)
+#define say2(a,b,c) do { /* nothing */ } while (0)
 #endif
 
 static const char en85[] = {
diff --git a/cache.h b/cache.h
index 68258be..37ef9d8 100644
--- a/cache.h
+++ b/cache.h
@@ -449,7 +449,7 @@ extern int init_db(const char *template_dir, unsigned int flags);
 				alloc = alloc_nr(alloc); \
 			x = xrealloc((x), alloc * sizeof(*(x))); \
 		} \
-	} while(0)
+	} while (0)
 
 /* Initialize and use the cache information */
 extern int read_index(struct index_state *);
diff --git a/diffcore.h b/diffcore.h
index fed9b15..05ebc11 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -98,7 +98,7 @@ struct diff_queue_struct {
 		(q)->queue = NULL; \
 		(q)->nr = (q)->alloc = 0; \
 		(q)->run = 0; \
-	} while(0);
+	} while (0)
 
 extern struct diff_queue_struct diff_queued_diff;
 extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
@@ -118,9 +118,9 @@ void diff_debug_filespec(struct diff_filespec *, int, const char *);
 void diff_debug_filepair(const struct diff_filepair *, int);
 void diff_debug_queue(const char *, struct diff_queue_struct *);
 #else
-#define diff_debug_filespec(a,b,c) do {} while(0)
-#define diff_debug_filepair(a,b) do {} while(0)
-#define diff_debug_queue(a,b) do {} while(0)
+#define diff_debug_filespec(a,b,c) do { /* nothing */ } while (0)
+#define diff_debug_filepair(a,b) do { /* nothing */ } while (0)
+#define diff_debug_queue(a,b) do { /* nothing */ } while (0)
 #endif
 
 extern int diffcore_count_changes(struct diff_filespec *src,
diff --git a/http.h b/http.h
index a0b5901..173f74c 100644
--- a/http.h
+++ b/http.h
@@ -23,10 +23,10 @@
 #endif
 
 #if LIBCURL_VERSION_NUM < 0x070704
-#define curl_global_cleanup() do { /* nothing */ } while(0)
+#define curl_global_cleanup() do { /* nothing */ } while (0)
 #endif
 #if LIBCURL_VERSION_NUM < 0x070800
-#define curl_global_init(a) do { /* nothing */ } while(0)
+#define curl_global_init(a) do { /* nothing */ } while (0)
 #endif
 
 #if (LIBCURL_VERSION_NUM < 0x070c04) || (LIBCURL_VERSION_NUM == 0x071000)
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 04/10] Add treap implementation
  2010-08-12 22:11         ` Jonathan Nieder
@ 2010-08-12 22:44           ` Junio C Hamano
  0 siblings, 0 replies; 79+ messages in thread
From: Junio C Hamano @ 2010-08-12 22:44 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

Jonathan Nieder <jrnieder@gmail.com> writes:

> diff --git a/diffcore.h b/diffcore.h
> index fed9b15..05ebc11 100644
> --- a/diffcore.h
> +++ b/diffcore.h
> @@ -98,7 +98,7 @@ struct diff_queue_struct {
>  		(q)->queue = NULL; \
>  		(q)->nr = (q)->alloc = 0; \
>  		(q)->run = 0; \
> -	} while(0);
> +	} while (0)

This is a _bad_ one.  Thanks.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH/WIP 00/16] svn delta applier
  2010-08-10 12:53     ` [PATCH 0/10] rr/svn-export reroll Ramkumar Ramachandra
  2010-08-11  1:53       ` Jonathan Nieder
@ 2010-10-11  2:34       ` Jonathan Nieder
  2010-10-11  2:37         ` [PATCH 01/16] vcs-svn: Eliminate global byte_buffer[] array Jonathan Nieder
                           ` (15 more replies)
  1 sibling, 16 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:34 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Hi,

The svndiff format has proved more difficult to parse than expected.
This series documents the current state of things, and though it is
not complete, it should be ready for nitpicking by the masses.

Patches 1-4 modify the line_buffer API by introducing a struct
line_buffer to collect state that was previously held in global
variables.  Callers can use multiple line_buffers to manage input from
multiple files at a time.

Patches 5-10 add various utility functions to the line_buffer API
(wrapping strbuf_fread(), fgetc(), etc).  Putting the helpers there
instead of having callers work with the FILE* directly means one
could easily

 - tweak the input stream (to insert "link: " at the beginning
   for symlinks?);
 - trace reads, for debugging; or
 - use read() directly in place of stdio and limit the number of bytes
   buffered

if one wants to.

Patch 11 adds a data structure and function to manage a "sliding
window" without using mmap() or fseek().  See the svndiff0 spec[1] for
how this would be used.

Patches 12 and 13 are some basic components for reading an svndiff0
file: reading variable-length integers and the opening magic bytes.

Patch 15 makes the svn-fe test usable on systems (like Ram's) without
libsvn-perl installed.  It also should make the test easier to read
for people unfamiliar with lib-git-svn.sh.

Patch 16 is the delta parser/applier.  This patch does _not_ add it to
contrib/svn-fe, even though that would be useful, since the
command-line interface is not set in stone yet.  If you want to try it
out, use the test-svn-fe command:

	test-svn-fe -d <preimage> <delta> <delta length>

The preimage or delta arg can be /dev/stdin for use in a pipeline.
Both are only read sequentially; they do not need to be regular files.

One of the test cases is enormous.  The svn delta lib doesn't use
multiple windows except when dealing with relatively big files, but
probably the test case should be replaced with a smaller, artificial
example.

One of the test cases does not pass.  I also don't know how to apply
the delta by hand --- it seems to have some extra bytes at the end. :(
Unfortunately the svndiff0 spec is not as clear about when to stop
reading as one might like

The code separately maintains nominal and actual lengths for a few
buffers, since truncated input is permitted (and even required) in the
deltas svn produces, though the svndiff0 spec does not document the
semantics of that.

For svn-fe changes to take advantage of this code to handle the
dumpfilev3 format, see <git://github.com/barrbrain/git.git>[2].  So
now the full svnrdump | svn-fe | fast-import pipeline can be
experienced.  It still chokes on some deltas in the wild.

Thoughts, cleanups, test cases, bug reports, improvements welcome. :)

Enjoy,
Jonathan Nieder (15):
  vcs-svn: Eliminate global byte_buffer[] array
  vcs-svn: Replace buffer_read_string()'s memory pool with a strbuf
  vcs-svn: Collect line_buffer data in a struct
  vcs-svn: Teach line_buffer to handle multiple input files
  vcs-svn: Make buffer_skip_bytes() report partial reads
  vcs-svn: Better support for reading large files
  vcs-svn: Add binary-safe read() function
  vcs-svn: Let callers peek ahead to find stream end
  vcs-svn: Allow input errors to be detected early
  vcs-svn: Allow character-oriented input
  vcs-svn: Add code to maintain a sliding view of a file
  vcs-svn: Learn to parse variable-length integers
  vcs-svn: Learn to check for SVN\0 magic
  compat: helper for detecting unsigned overflow
  vcs-svn: Add svn delta parser

Ramkumar Ramachandra (1):
  t9010 (svn-fe): Eliminate dependency on svn perl bindings

 Makefile                 |    5 +-
 vcs-svn/line_buffer.txt  |    8 +-
 vcs-svn/fast_export.c    |    6 +-
 vcs-svn/fast_export.h    |    5 +-
 vcs-svn/line_buffer.c    |   99 +-
 vcs-svn/line_buffer.h    |   29 +-
 vcs-svn/sliding_window.c |   65 +
 vcs-svn/sliding_window.h |   14 +
 vcs-svn/svndiff.c        |  344 +
 vcs-svn/svndiff.h        |    9 +
 vcs-svn/svndump.c        |   29 +-
 vcs-svn/LICENSE          |    2 +
 git-compat-util.h        |    6 +
 test-line-buffer.c       |   17 +-
 test-svn-fe.c            |   37 +-
 t/t9010-svn-fe.sh        |   29 +-
 t/t9010/Xerces.cpp.diff0 |  Bin 0 -> 12185 bytes
 t/t9010/Xerces.cpp.done  |54963 +++++++++++++++++++++++++++++++++++++++++++++
 t/t9010/Xerces.cpp.src   |55052 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t9010/newdata.diff0    |  Bin 0 -> 19392 bytes
 t/t9010/newdata.done     |  522 +
 t/t9010/src.diff0        |  Bin 0 -> 74 bytes
 t/t9010/src.done         |  522 +
 23 files changed, 111677 insertions(+), 86 deletions(-)
 create mode 100644 vcs-svn/sliding_window.c
 create mode 100644 vcs-svn/sliding_window.h
 create mode 100644 vcs-svn/svndiff.c
 create mode 100644 vcs-svn/svndiff.h
 create mode 100644 t/t9010/Xerces.cpp.diff0
 create mode 100644 t/t9010/Xerces.cpp.done
 create mode 100644 t/t9010/Xerces.cpp.src
 create mode 100644 t/t9010/blank.done
 create mode 100644 t/t9010/newdata.diff0
 create mode 100644 t/t9010/newdata.done
 create mode 100644 t/t9010/src.diff0
 create mode 100644 t/t9010/src.done

[1] http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff
[2] And some design notes:
http://thread.gmane.org/gmane.comp.version-control.git/150005/focus=157119

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 01/16] vcs-svn: Eliminate global byte_buffer[] array
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
@ 2010-10-11  2:37         ` Jonathan Nieder
  2010-10-11  2:39         ` [PATCH 03/16] vcs-svn: Collect line_buffer data in a struct Jonathan Nieder
                           ` (14 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:37 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The data stored in byte_buffer[] is always either discarded or
written to stdout immediately.  No need for it to persist between
function calls.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 1543567..f22c94f 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -14,7 +14,6 @@
 obj_pool_gen(blob, char, 4096)
 
 static char line_buffer[LINE_BUFFER_LEN];
-static char byte_buffer[COPY_BUFFER_LEN];
 static FILE *infile;
 
 int buffer_init(const char *filename)
@@ -68,6 +67,7 @@ char *buffer_read_string(uint32_t len)
 
 void buffer_copy_bytes(uint32_t len)
 {
+	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
 	while (len > 0 && !feof(infile) && !ferror(infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
@@ -83,6 +83,7 @@ void buffer_copy_bytes(uint32_t len)
 
 void buffer_skip_bytes(uint32_t len)
 {
+	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
 	while (len > 0 && !feof(infile) && !ferror(infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 03/16] vcs-svn: Collect line_buffer data in a struct
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
  2010-10-11  2:37         ` [PATCH 01/16] vcs-svn: Eliminate global byte_buffer[] array Jonathan Nieder
@ 2010-10-11  2:39         ` Jonathan Nieder
  2010-10-11  2:41         ` [PATCH 04/16] vcs-svn: Teach line_buffer to handle multiple input files Jonathan Nieder
                           ` (13 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:39 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Prepare for the line_buffer lib to support input from multiple files,
by collecting global state in a struct that can be easily passed around.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |   45 ++++++++++++++++++++++-----------------------
 vcs-svn/line_buffer.h |   11 +++++++++++
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 6f32f28..e7bc230 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -7,17 +7,16 @@
 #include "line_buffer.h"
 #include "strbuf.h"
 
-#define LINE_BUFFER_LEN 10000
 #define COPY_BUFFER_LEN 4096
-
-static char line_buffer[LINE_BUFFER_LEN];
-static struct strbuf blob_buffer = STRBUF_INIT;
-static FILE *infile;
+static struct line_buffer buf_ = LINE_BUFFER_INIT;
+static struct line_buffer *buf;
 
 int buffer_init(const char *filename)
 {
-	infile = filename ? fopen(filename, "r") : stdin;
-	if (!infile)
+	buf = &buf_;
+
+	buf->infile = filename ? fopen(filename, "r") : stdin;
+	if (!buf->infile)
 		return -1;
 	return 0;
 }
@@ -25,10 +24,10 @@ int buffer_init(const char *filename)
 int buffer_deinit(void)
 {
 	int err;
-	if (infile == stdin)
-		return ferror(infile);
-	err = ferror(infile);
-	err |= fclose(infile);
+	if (buf->infile == stdin)
+		return ferror(buf->infile);
+	err = ferror(buf->infile);
+	err |= fclose(buf->infile);
 	return err;
 }
 
@@ -36,13 +35,13 @@ int buffer_deinit(void)
 char *buffer_read_line(void)
 {
 	char *end;
-	if (!fgets(line_buffer, sizeof(line_buffer), infile))
+	if (!fgets(buf->line_buffer, sizeof(buf->line_buffer), buf->infile))
 		/* Error or data exhausted. */
 		return NULL;
-	end = line_buffer + strlen(line_buffer);
+	end = buf->line_buffer + strlen(buf->line_buffer);
 	if (end[-1] == '\n')
 		end[-1] = '\0';
-	else if (feof(infile))
+	else if (feof(buf->infile))
 		; /* No newline at end of file.  That's fine. */
 	else
 		/*
@@ -51,23 +50,23 @@ char *buffer_read_line(void)
 		 * but for now let's return an error.
 		 */
 		return NULL;
-	return line_buffer;
+	return buf->line_buffer;
 }
 
 char *buffer_read_string(uint32_t len)
 {
-	strbuf_reset(&blob_buffer);
-	strbuf_fread(&blob_buffer, len, infile);
-	return ferror(infile) ? NULL : blob_buffer.buf;
+	strbuf_reset(&buf->blob_buffer);
+	strbuf_fread(&buf->blob_buffer, len, buf->infile);
+	return ferror(buf->infile) ? NULL : buf->blob_buffer.buf;
 }
 
 void buffer_copy_bytes(uint32_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
-	while (len > 0 && !feof(infile) && !ferror(infile)) {
+	while (len > 0 && !feof(buf->infile) && !ferror(buf->infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
-		in = fread(byte_buffer, 1, in, infile);
+		in = fread(byte_buffer, 1, in, buf->infile);
 		len -= in;
 		fwrite(byte_buffer, 1, in, stdout);
 		if (ferror(stdout)) {
@@ -81,14 +80,14 @@ void buffer_skip_bytes(uint32_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
-	while (len > 0 && !feof(infile) && !ferror(infile)) {
+	while (len > 0 && !feof(buf->infile) && !ferror(buf->infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
-		in = fread(byte_buffer, 1, in, infile);
+		in = fread(byte_buffer, 1, in, buf->infile);
 		len -= in;
 	}
 }
 
 void buffer_reset(void)
 {
-	strbuf_release(&blob_buffer);
+	strbuf_release(&buf->blob_buffer);
 }
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 9c78ae1..4ae1133 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -1,6 +1,17 @@
 #ifndef LINE_BUFFER_H_
 #define LINE_BUFFER_H_
 
+#include "strbuf.h"
+
+#define LINE_BUFFER_LEN 10000
+
+struct line_buffer {
+	char line_buffer[LINE_BUFFER_LEN];
+	struct strbuf blob_buffer;
+	FILE *infile;
+};
+#define LINE_BUFFER_INIT {"", STRBUF_INIT, NULL}
+
 int buffer_init(const char *filename);
 int buffer_deinit(void);
 char *buffer_read_line(void);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 04/16] vcs-svn: Teach line_buffer to handle multiple input files
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
  2010-10-11  2:37         ` [PATCH 01/16] vcs-svn: Eliminate global byte_buffer[] array Jonathan Nieder
  2010-10-11  2:39         ` [PATCH 03/16] vcs-svn: Collect line_buffer data in a struct Jonathan Nieder
@ 2010-10-11  2:41         ` Jonathan Nieder
  2010-10-11  2:44         ` [PATCH 05/16] vcs-svn: Make buffer_skip_bytes() report partial reads Jonathan Nieder
                           ` (12 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:41 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

The Subversion-format delta applier will use this to stream a delta
and the preimage it applies to at the same time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.txt |    5 +++--
 vcs-svn/fast_export.c   |    6 +++---
 vcs-svn/fast_export.h   |    5 ++++-
 vcs-svn/line_buffer.c   |   20 ++++++++------------
 vcs-svn/line_buffer.h   |   14 +++++++-------
 vcs-svn/svndump.c       |   29 ++++++++++++++++-------------
 test-line-buffer.c      |   17 +++++++++--------
 7 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/vcs-svn/line_buffer.txt b/vcs-svn/line_buffer.txt
index 8906fb1..f8eaa4d 100644
--- a/vcs-svn/line_buffer.txt
+++ b/vcs-svn/line_buffer.txt
@@ -14,14 +14,15 @@ Calling sequence
 
 The calling program:
 
+ - initializes a `struct line_buffer` to LINE_BUFFER_INIT
  - specifies a file to read with `buffer_init`
  - processes input with `buffer_read_line`, `buffer_read_string`,
    `buffer_skip_bytes`, and `buffer_copy_bytes`
  - closes the file with `buffer_deinit`, perhaps to start over and
    read another file.
 
-Before exiting, the caller can use `buffer_reset` to deallocate
-resources for the benefit of profiling tools.
+When finished, the caller can use `buffer_reset` to deallocate
+resources.
 
 Functions
 ---------
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 6cfa256..260cf50 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -63,14 +63,14 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
-		buffer_skip_bytes(5);
+		buffer_skip_bytes(input, 5);
 		len -= 5;
 	}
 	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
-	buffer_copy_bytes(len);
+	buffer_copy_bytes(input, len);
 	fputc('\n', stdout);
 }
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 2aaaea5..054e7d5 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -1,11 +1,14 @@
 #ifndef FAST_EXPORT_H_
 #define FAST_EXPORT_H_
 
+#include "line_buffer.h"
+
 void fast_export_delete(uint32_t depth, uint32_t *path);
 void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 			uint32_t mark);
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url, unsigned long timestamp);
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len);
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
+		      struct line_buffer *input);
 
 #endif
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index e7bc230..806932b 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -8,20 +8,16 @@
 #include "strbuf.h"
 
 #define COPY_BUFFER_LEN 4096
-static struct line_buffer buf_ = LINE_BUFFER_INIT;
-static struct line_buffer *buf;
 
-int buffer_init(const char *filename)
+int buffer_init(struct line_buffer *buf, const char *filename)
 {
-	buf = &buf_;
-
 	buf->infile = filename ? fopen(filename, "r") : stdin;
 	if (!buf->infile)
 		return -1;
 	return 0;
 }
 
-int buffer_deinit(void)
+int buffer_deinit(struct line_buffer *buf)
 {
 	int err;
 	if (buf->infile == stdin)
@@ -32,7 +28,7 @@ int buffer_deinit(void)
 }
 
 /* Read a line without trailing newline. */
-char *buffer_read_line(void)
+char *buffer_read_line(struct line_buffer *buf)
 {
 	char *end;
 	if (!fgets(buf->line_buffer, sizeof(buf->line_buffer), buf->infile))
@@ -53,14 +49,14 @@ char *buffer_read_line(void)
 	return buf->line_buffer;
 }
 
-char *buffer_read_string(uint32_t len)
+char *buffer_read_string(struct line_buffer *buf, uint32_t len)
 {
 	strbuf_reset(&buf->blob_buffer);
 	strbuf_fread(&buf->blob_buffer, len, buf->infile);
 	return ferror(buf->infile) ? NULL : buf->blob_buffer.buf;
 }
 
-void buffer_copy_bytes(uint32_t len)
+void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
@@ -70,13 +66,13 @@ void buffer_copy_bytes(uint32_t len)
 		len -= in;
 		fwrite(byte_buffer, 1, in, stdout);
 		if (ferror(stdout)) {
-			buffer_skip_bytes(len);
+			buffer_skip_bytes(buf, len);
 			return;
 		}
 	}
 }
 
-void buffer_skip_bytes(uint32_t len)
+void buffer_skip_bytes(struct line_buffer *buf, uint32_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
@@ -87,7 +83,7 @@ void buffer_skip_bytes(uint32_t len)
 	}
 }
 
-void buffer_reset(void)
+void buffer_reset(struct line_buffer *buf)
 {
 	strbuf_release(&buf->blob_buffer);
 }
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 4ae1133..fb37390 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -12,12 +12,12 @@ struct line_buffer {
 };
 #define LINE_BUFFER_INIT {"", STRBUF_INIT, NULL}
 
-int buffer_init(const char *filename);
-int buffer_deinit(void);
-char *buffer_read_line(void);
-char *buffer_read_string(uint32_t len);
-void buffer_copy_bytes(uint32_t len);
-void buffer_skip_bytes(uint32_t len);
-void buffer_reset(void);
+int buffer_init(struct line_buffer *buf, const char *filename);
+int buffer_deinit(struct line_buffer *buf);
+char *buffer_read_line(struct line_buffer *buf);
+char *buffer_read_string(struct line_buffer *buf, uint32_t len);
+void buffer_copy_bytes(struct line_buffer *buf, uint32_t len);
+void buffer_skip_bytes(struct line_buffer *buf, uint32_t len);
+void buffer_reset(struct line_buffer *buf);
 
 #endif
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 53d0215..3bba0fe 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -30,6 +30,8 @@
 /* Create memory pool for log messages */
 obj_pool_gen(log, char, 4096)
 
+static struct line_buffer input = LINE_BUFFER_INIT;
+
 static char* log_copy(uint32_t length, char *log)
 {
 	char *buffer;
@@ -113,14 +115,14 @@ static void read_props(void)
 	uint32_t key = ~0;
 	char *val = NULL;
 	char *t;
-	while ((t = buffer_read_line()) && strcmp(t, "PROPS-END")) {
+	while ((t = buffer_read_line(&input)) && strcmp(t, "PROPS-END")) {
 		if (!strncmp(t, "K ", 2)) {
 			len = atoi(&t[2]);
-			key = pool_intern(buffer_read_string(len));
-			buffer_read_line();
+			key = pool_intern(buffer_read_string(&input, len));
+			buffer_read_line(&input);
 		} else if (!strncmp(t, "V ", 2)) {
 			len = atoi(&t[2]);
-			val = buffer_read_string(len);
+			val = buffer_read_string(&input, len);
 			if (key == keys.svn_log) {
 				/* Value length excludes terminating nul. */
 				rev_ctx.log = log_copy(len + 1, val);
@@ -135,7 +137,7 @@ static void read_props(void)
 				node_ctx.type = REPO_MODE_LNK;
 			}
 			key = ~0;
-			buffer_read_line();
+			buffer_read_line(&input);
 		}
 	}
 }
@@ -177,9 +179,10 @@ static void handle_node(void)
 		node_ctx.type = node_ctx.srcMode;
 
 	if (node_ctx.mark)
-		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength);
+		fast_export_blob(node_ctx.type,
+				 node_ctx.mark, node_ctx.textLength, &input);
 	else if (node_ctx.textLength != LENGTH_UNKNOWN)
-		buffer_skip_bytes(node_ctx.textLength);
+		buffer_skip_bytes(&input, node_ctx.textLength);
 }
 
 static void handle_revision(void)
@@ -198,7 +201,7 @@ void svndump_read(const char *url)
 	uint32_t key;
 
 	reset_dump_ctx(pool_intern(url));
-	while ((t = buffer_read_line())) {
+	while ((t = buffer_read_line(&input))) {
 		val = strstr(t, ": ");
 		if (!val)
 			continue;
@@ -250,7 +253,7 @@ void svndump_read(const char *url)
 			node_ctx.propLength = atoi(val);
 		} else if (key == keys.content_length) {
 			len = atoi(val);
-			buffer_read_line();
+			buffer_read_line(&input);
 			if (active_ctx == REV_CTX) {
 				read_props();
 			} else if (active_ctx == NODE_CTX) {
@@ -258,7 +261,7 @@ void svndump_read(const char *url)
 				active_ctx = REV_CTX;
 			} else {
 				fprintf(stderr, "Unexpected content length header: %"PRIu32"\n", len);
-				buffer_skip_bytes(len);
+				buffer_skip_bytes(&input, len);
 			}
 		}
 	}
@@ -270,7 +273,7 @@ void svndump_read(const char *url)
 
 void svndump_init(const char *filename)
 {
-	buffer_init(filename);
+	buffer_init(&input, filename);
 	repo_init();
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
@@ -285,7 +288,7 @@ void svndump_deinit(void)
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
 	reset_node_ctx(NULL);
-	if (buffer_deinit())
+	if (buffer_deinit(&input))
 		fprintf(stderr, "Input error\n");
 	if (ferror(stdout))
 		fprintf(stderr, "Output error\n");
@@ -294,7 +297,7 @@ void svndump_deinit(void)
 void svndump_reset(void)
 {
 	log_reset();
-	buffer_reset();
+	buffer_reset(&input);
 	repo_reset();
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
diff --git a/test-line-buffer.c b/test-line-buffer.c
index c11bf7f..f9af892 100644
--- a/test-line-buffer.c
+++ b/test-line-buffer.c
@@ -22,25 +22,26 @@ static uint32_t strtouint32(const char *s)
 
 int main(int argc, char *argv[])
 {
+	struct line_buffer buf = LINE_BUFFER_INIT;
 	char *s;
 
 	if (argc != 1)
 		usage("test-line-buffer < input.txt");
-	if (buffer_init(NULL))
+	if (buffer_init(&buf, NULL))
 		die_errno("open error");
-	while ((s = buffer_read_line())) {
-		s = buffer_read_string(strtouint32(s));
+	while ((s = buffer_read_line(&buf))) {
+		s = buffer_read_string(&buf, strtouint32(s));
 		fputs(s, stdout);
 		fputc('\n', stdout);
-		buffer_skip_bytes(1);
-		if (!(s = buffer_read_line()))
+		buffer_skip_bytes(&buf, 1);
+		if (!(s = buffer_read_line(&buf)))
 			break;
-		buffer_copy_bytes(strtouint32(s) + 1);
+		buffer_copy_bytes(&buf, strtouint32(s) + 1);
 	}
-	if (buffer_deinit())
+	if (buffer_deinit(&buf))
 		die("input error");
 	if (ferror(stdout))
 		die("output error");
-	buffer_reset();
+	buffer_reset(&buf);
 	return 0;
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 05/16] vcs-svn: Make buffer_skip_bytes() report partial reads
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (2 preceding siblings ...)
  2010-10-11  2:41         ` [PATCH 04/16] vcs-svn: Teach line_buffer to handle multiple input files Jonathan Nieder
@ 2010-10-11  2:44         ` Jonathan Nieder
  2010-10-11  2:46         ` [PATCH 06/16] vcs-svn: Improve support for reading large files Jonathan Nieder
                           ` (11 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:44 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Tell the caller know how many bytes were actually skipped.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.txt |    3 ++-
 vcs-svn/line_buffer.c   |   15 ++++++++-------
 vcs-svn/line_buffer.h   |    2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/vcs-svn/line_buffer.txt b/vcs-svn/line_buffer.txt
index f8eaa4d..d06db24 100644
--- a/vcs-svn/line_buffer.txt
+++ b/vcs-svn/line_buffer.txt
@@ -53,7 +53,8 @@ Functions
 
 `buffer_skip_bytes`::
 	Discards `len` bytes from the input stream (stopping early
-	if necessary because of an error or eof).
+	if necessary because of an error or eof).  Return value is
+	the number of bytes successfully read.
 
 `buffer_reset`::
 	Deallocates non-static buffers.
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 806932b..999368b 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -72,15 +72,16 @@ void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
 	}
 }
 
-void buffer_skip_bytes(struct line_buffer *buf, uint32_t len)
+uint32_t buffer_skip_bytes(struct line_buffer *buf, uint32_t nbytes)
 {
-	char byte_buffer[COPY_BUFFER_LEN];
-	uint32_t in;
-	while (len > 0 && !feof(buf->infile) && !ferror(buf->infile)) {
-		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
-		in = fread(byte_buffer, 1, in, buf->infile);
-		len -= in;
+	uint32_t done = 0;
+	while (done < nbytes && !feof(buf->infile) && !ferror(buf->infile)) {
+		char byte_buffer[COPY_BUFFER_LEN];
+		uint32_t len = nbytes - done;
+		uint32_t in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		done += fread(byte_buffer, 1, in, buf->infile);
 	}
+	return done;
 }
 
 void buffer_reset(struct line_buffer *buf)
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index fb37390..2796ba7 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -17,7 +17,7 @@ int buffer_deinit(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
 void buffer_copy_bytes(struct line_buffer *buf, uint32_t len);
-void buffer_skip_bytes(struct line_buffer *buf, uint32_t len);
+uint32_t buffer_skip_bytes(struct line_buffer *buf, uint32_t len);
 void buffer_reset(struct line_buffer *buf);
 
 #endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 06/16] vcs-svn: Improve support for reading large files
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (3 preceding siblings ...)
  2010-10-11  2:44         ` [PATCH 05/16] vcs-svn: Make buffer_skip_bytes() report partial reads Jonathan Nieder
@ 2010-10-11  2:46         ` Jonathan Nieder
  2010-10-11  2:47         ` [PATCH 07/16] vcs-svn: Add binary-safe read() function Jonathan Nieder
                           ` (10 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:46 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Tweak the line_buffer API to permit seeking and cat-ing segments
longer than 4 GiB.  This would be particularly useful for applying
deltas that remove a large segment from the middle of a file.

Callers would still have to be updated to take advantage of this.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Since off_t is a signed type, on systems with 32-bit file offsets,
this might make things worse.  Is that worth worrying about?

 vcs-svn/line_buffer.c |    8 ++++----
 vcs-svn/line_buffer.h |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 999368b..fd1d3c3 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -56,7 +56,7 @@ char *buffer_read_string(struct line_buffer *buf, uint32_t len)
 	return ferror(buf->infile) ? NULL : buf->blob_buffer.buf;
 }
 
-void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
+void buffer_copy_bytes(struct line_buffer *buf, off_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
@@ -72,12 +72,12 @@ void buffer_copy_bytes(struct line_buffer *buf, uint32_t len)
 	}
 }
 
-uint32_t buffer_skip_bytes(struct line_buffer *buf, uint32_t nbytes)
+off_t buffer_skip_bytes(struct line_buffer *buf, off_t nbytes)
 {
-	uint32_t done = 0;
+	off_t done = 0;
 	while (done < nbytes && !feof(buf->infile) && !ferror(buf->infile)) {
 		char byte_buffer[COPY_BUFFER_LEN];
-		uint32_t len = nbytes - done;
+		off_t len = nbytes - done;
 		uint32_t in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
 		done += fread(byte_buffer, 1, in, buf->infile);
 	}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 2796ba7..2849faa 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -16,8 +16,8 @@ int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
-void buffer_copy_bytes(struct line_buffer *buf, uint32_t len);
-uint32_t buffer_skip_bytes(struct line_buffer *buf, uint32_t len);
+void buffer_copy_bytes(struct line_buffer *buf, off_t len);
+off_t buffer_skip_bytes(struct line_buffer *buf, off_t len);
 void buffer_reset(struct line_buffer *buf);
 
 #endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 07/16] vcs-svn: Add binary-safe read() function
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (4 preceding siblings ...)
  2010-10-11  2:46         ` [PATCH 06/16] vcs-svn: Improve support for reading large files Jonathan Nieder
@ 2010-10-11  2:47         ` Jonathan Nieder
  2010-10-11  2:47         ` [PATCH 08/16] vcs-svn: Let callers peek ahead to find stream end Jonathan Nieder
                           ` (9 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:47 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

buffer_read_binary() writes to a strbuf so the caller does not need
to keep track of the number of bytes read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    6 ++++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index fd1d3c3..6dd0189 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -56,6 +56,12 @@ char *buffer_read_string(struct line_buffer *buf, uint32_t len)
 	return ferror(buf->infile) ? NULL : buf->blob_buffer.buf;
 }
 
+void buffer_read_binary(struct strbuf *sb, uint32_t size,
+			struct line_buffer *buf)
+{
+	strbuf_fread(sb, size, buf->infile);
+}
+
 void buffer_copy_bytes(struct line_buffer *buf, off_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 2849faa..873b0e4 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -16,6 +16,7 @@ int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
+void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
 void buffer_copy_bytes(struct line_buffer *buf, off_t len);
 off_t buffer_skip_bytes(struct line_buffer *buf, off_t len);
 void buffer_reset(struct line_buffer *buf);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 08/16] vcs-svn: Let callers peek ahead to find stream end
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (5 preceding siblings ...)
  2010-10-11  2:47         ` [PATCH 07/16] vcs-svn: Add binary-safe read() function Jonathan Nieder
@ 2010-10-11  2:47         ` Jonathan Nieder
  2010-10-11  2:51         ` [PATCH 09/16] vcs-svn: Allow input errors to be detected early Jonathan Nieder
                           ` (8 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:47 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The buffer_at_eof() function returns 1 if and only if all input from
the input stream has been exhausted (because of EOF or error).  The
implementation calls fgetc() followed by ungetc() to force an EOF
condition when there is no more input remaining.

Like many functions in the line_buffer API, this function is not
thread-safe.  It could be made to be so with a mutex if needed.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |   10 ++++++++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 6dd0189..19caa21 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -27,6 +27,16 @@ int buffer_deinit(struct line_buffer *buf)
 	return err;
 }
 
+int buffer_at_eof(struct line_buffer *buf)
+{
+	int ch;
+	if ((ch = fgetc(buf->infile)) == EOF)
+		return 1;
+	if (ungetc(ch, buf->infile) == EOF)
+		return error("cannot unget %c: %s\n", ch, strerror(errno));
+	return 0;
+}
+
 /* Read a line without trailing newline. */
 char *buffer_read_line(struct line_buffer *buf)
 {
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 873b0e4..0269aed 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -14,6 +14,7 @@ struct line_buffer {
 
 int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
+int buffer_at_eof(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
 void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 09/16] vcs-svn: Allow input errors to be detected early
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (6 preceding siblings ...)
  2010-10-11  2:47         ` [PATCH 08/16] vcs-svn: Let callers peek ahead to find stream end Jonathan Nieder
@ 2010-10-11  2:51         ` Jonathan Nieder
  2010-10-11  2:52         ` [PATCH 10/16] vcs-svn: Allow character-oriented input Jonathan Nieder
                           ` (7 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:51 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Add a buffer_ferror() function to read the error flag from the input
stream, so callers can do:

	some_error_prone_operation(f, ...);
	if (buffer_ferror(f))
		return error("input error: %s", strerror(errno));

instead of waiting until it is time to close the file.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    5 +++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 19caa21..43da509 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -27,6 +27,11 @@ int buffer_deinit(struct line_buffer *buf)
 	return err;
 }
 
+int buffer_ferror(struct line_buffer *buf)
+{
+	return ferror(buf->infile);
+}
+
 int buffer_at_eof(struct line_buffer *buf)
 {
 	int ch;
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 0269aed..4899289 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -14,6 +14,7 @@ struct line_buffer {
 
 int buffer_init(struct line_buffer *buf, const char *filename);
 int buffer_deinit(struct line_buffer *buf);
+int buffer_ferror(struct line_buffer *buf);
 int buffer_at_eof(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 10/16] vcs-svn: Allow character-oriented input
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (7 preceding siblings ...)
  2010-10-11  2:51         ` [PATCH 09/16] vcs-svn: Allow input errors to be detected early Jonathan Nieder
@ 2010-10-11  2:52         ` Jonathan Nieder
  2010-10-11  2:53         ` [PATCH 11/16] vcs-svn: Add code to maintain a sliding view of a file Jonathan Nieder
                           ` (6 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:52 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

buffer_read_char() can be used in place of buffer_read_string(1)
to avoid consuming valuable static buffer space.  The delta applier
will use this to read variable-length integers one byte at a time.

Underneath, it is fgetc(), wrapped so the line_buffer library can
maintain its role as gatekeeper of input.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    5 +++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 43da509..c54031b 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -42,6 +42,11 @@ int buffer_at_eof(struct line_buffer *buf)
 	return 0;
 }
 
+int buffer_read_char(struct line_buffer *buf)
+{
+	return fgetc(buf->infile);
+}
+
 /* Read a line without trailing newline. */
 char *buffer_read_line(struct line_buffer *buf)
 {
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 4899289..2375ee1 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -18,6 +18,7 @@ int buffer_ferror(struct line_buffer *buf);
 int buffer_at_eof(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
+int buffer_read_char(struct line_buffer *buf);
 void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
 void buffer_copy_bytes(struct line_buffer *buf, off_t len);
 off_t buffer_skip_bytes(struct line_buffer *buf, off_t len);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 11/16] vcs-svn: Add code to maintain a sliding view of a file
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (8 preceding siblings ...)
  2010-10-11  2:52         ` [PATCH 10/16] vcs-svn: Allow character-oriented input Jonathan Nieder
@ 2010-10-11  2:53         ` Jonathan Nieder
  2010-10-11  2:55         ` [PATCH 12/16] vcs-svn: Learn to parse variable-length integers Jonathan Nieder
                           ` (5 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:53 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Subversion's delta format has the convenient property that applying
each section of the delta only requires examining (and keeping in
memory) a small portion of the preimage.  At any moment, this portion
begins at a well-defined file offset and has a well-defined length,
and as the delta is applied, it moves from the beginning to the end
of the file.  Add a move_window() function to keep track of such a
window into a file.

You can use it like this:

	struct line_buffer preimage = LINE_BUFFER_INIT;
	buffer_init(&preimage, NULL);
	struct view window = {&preimage, 0, STRBUF_INIT};
	move_window(&window, 3, 7);	/* (1) */
	move_window(&window, 5, 5);	/* (2) */
	move_window(&window, 12, 2);	/* (3) */
	strbuf_release(&window.buf);
	buffer_deinit(&preimage);

In this example: (1) reads 10 bytes and discards the first 3;
(2) discards the first 2, which are not needed any more; and (3)
skips 2 bytes and reads 2 new bytes to work with.

Whenever move_window() returns, the file position indicator is at
position window->off + window->buf.len and the data from positions
window->off to the current file position are stored in window->buf.

This function does only sequential access and never seeks, so it
can be safely used on pipes and sockets.

On end-of-file, move_window() just silently reads less than the
caller requested.  On other errors, it prints a message to stderr
and returns -1.

Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile                 |    5 ++-
 vcs-svn/sliding_window.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/sliding_window.h |   14 ++++++++++
 vcs-svn/LICENSE          |    2 +
 4 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 vcs-svn/sliding_window.c
 create mode 100644 vcs-svn/sliding_window.h

diff --git a/Makefile b/Makefile
index 1f1ce04..d99da33 100644
--- a/Makefile
+++ b/Makefile
@@ -1765,7 +1765,8 @@ endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
-	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o
+	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o \
+	vcs-svn/sliding_window.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1892,7 +1893,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
 	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
-	vcs-svn/svndump.h
+	vcs-svn/svndump.h vcs-svn/sliding_window.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/vcs-svn/sliding_window.c b/vcs-svn/sliding_window.c
new file mode 100644
index 0000000..8273970
--- /dev/null
+++ b/vcs-svn/sliding_window.c
@@ -0,0 +1,65 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "sliding_window.h"
+#include "line_buffer.h"
+#include "strbuf.h"
+
+static void strbuf_remove_from_left(struct strbuf *sb, size_t nbytes)
+{
+	assert(nbytes <= sb->len);
+	memmove(sb->buf, sb->buf + nbytes, sb->len - nbytes);
+	strbuf_setlen(sb, sb->len - nbytes);
+}
+
+static int check_overflow(off_t a, size_t b)
+{
+	if ((off_t) b < 0)
+		return error("Unrepresentable length: "
+				"%"PRIu64" > OFF_MAX", (uint64_t) b);
+	if (signed_add_overflows(a, (off_t) b))
+		return error("Unrepresentable offset: "
+				"%"PRIu64" + %"PRIu64" > OFF_MAX",
+				(uint64_t) a, (uint64_t) b);
+	return 0;
+}
+
+int move_window(struct view *view, off_t off, size_t len)
+{
+	off_t file_offset;
+	assert(view && view->file);
+	assert(!check_overflow(view->off, view->buf.len));
+
+	if (check_overflow(off, len))
+		return -1;
+	if (off < view->off || off + len < view->off + view->buf.len)
+		return error("Invalid delta: window slides left");
+
+	file_offset = view->off + view->buf.len;
+	if (off < file_offset)
+		/* Move the overlapping region into place. */
+		strbuf_remove_from_left(&view->buf, off - view->off);
+	else
+		strbuf_setlen(&view->buf, 0);
+	if (off > file_offset) {
+		/* Seek ahead to skip the gap. */
+		const off_t gap = off - file_offset;
+		const off_t nread = buffer_skip_bytes(view->file, gap);
+		if (nread != gap) {
+			if (!buffer_ferror(view->file))	/* View ends early. */
+				goto done;
+			return error("Cannot seek forward in input: %s",
+				     strerror(errno));
+		}
+		file_offset += gap;
+	}
+	buffer_read_binary(&view->buf, len - view->buf.len, view->file);
+	if (buffer_ferror(view->file))
+		return error("Cannot read preimage: %s", strerror(errno));
+ done:
+	view->off = off;
+	return 0;
+}
diff --git a/vcs-svn/sliding_window.h b/vcs-svn/sliding_window.h
new file mode 100644
index 0000000..b9f0552
--- /dev/null
+++ b/vcs-svn/sliding_window.h
@@ -0,0 +1,14 @@
+#ifndef SLIDING_WINDOW_H_
+#define SLIDING_WINDOW_H_
+
+#include "strbuf.h"
+
+struct view {
+	struct line_buffer *file;
+	off_t off;
+	struct strbuf buf;
+};
+
+extern int move_window(struct view *view, off_t off, size_t len);
+
+#endif
diff --git a/vcs-svn/LICENSE b/vcs-svn/LICENSE
index 0a5e3c4..805882c 100644
--- a/vcs-svn/LICENSE
+++ b/vcs-svn/LICENSE
@@ -1,6 +1,8 @@
 Copyright (C) 2010 David Barr <david.barr@cordelta.com>.
 All rights reserved.
 
+Copyright (C) 2010 Jonathan Nieder <jrnieder@gmail.com>.
+
 Copyright (C) 2008 Jason Evans <jasone@canonware.com>.
 All rights reserved.
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 12/16] vcs-svn: Learn to parse variable-length integers
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (9 preceding siblings ...)
  2010-10-11  2:53         ` [PATCH 11/16] vcs-svn: Add code to maintain a sliding view of a file Jonathan Nieder
@ 2010-10-11  2:55         ` Jonathan Nieder
  2010-10-11  2:58         ` [PATCH 13/16] vcs-svn: Learn to check for SVN\0 magic Jonathan Nieder
                           ` (4 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The humble beginnings of the svn-format delta applier.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Maybe this should be squashed with patch 16.

Ideas for eliminating the code duplication?

 vcs-svn/svndiff.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 59 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/svndiff.c

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
new file mode 100644
index 0000000..4d122a5
--- /dev/null
+++ b/vcs-svn/svndiff.c
@@ -0,0 +1,59 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "line_buffer.h"
+
+/*
+ * svndiff0 applier
+ *
+ * See http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff.
+ *
+ * int ::= highdigit* lowdigit;
+ * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
+ * lowdigit ::= # 7 bit value;
+ */
+
+#define VLI_CONTINUE	0x80
+#define VLI_DIGIT_MASK	0x7f
+#define VLI_BITS_PER_DIGIT 7
+
+static int read_int(struct line_buffer *in, uintmax_t *result, off_t *len)
+{
+	off_t sz = *len;
+	uintmax_t rv = 0;
+	while (sz) {
+		int ch = buffer_read_char(in);
+		if (ch == EOF)
+			break;
+		sz--;
+		rv <<= VLI_BITS_PER_DIGIT;
+		rv += (ch & VLI_DIGIT_MASK);
+		if (!(ch & VLI_CONTINUE)) {
+			*result = rv;
+			*len = sz;
+			return 0;
+		}
+	}
+	return error("Invalid delta: incomplete integer %"PRIuMAX, rv);
+}
+
+static int parse_int(const char **buf, size_t *result, const char *end)
+{
+	const char *pos;
+	size_t rv = 0;
+	for (pos = *buf; pos != end; pos++) {
+		unsigned char ch = *pos;
+		rv <<= VLI_BITS_PER_DIGIT;
+		rv += (ch & VLI_DIGIT_MASK);
+		if (!(ch & VLI_CONTINUE)) {
+			*result = rv;
+			*buf = pos + 1;
+			return 0;
+		}
+	}
+	return error("Invalid instruction: incomplete integer %"PRIu64,
+		     (uint64_t) rv);
+}
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 13/16] vcs-svn: Learn to check for SVN\0 magic
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (10 preceding siblings ...)
  2010-10-11  2:55         ` [PATCH 12/16] vcs-svn: Learn to parse variable-length integers Jonathan Nieder
@ 2010-10-11  2:58         ` Jonathan Nieder
  2010-10-11  2:59         ` [PATCH 14/16] compat: helper for detecting unsigned overflow Jonathan Nieder
                           ` (3 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:58 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The magic number of svn deltas is SVN followed by a null byte.
An alternative format (with compressed text) uses magic number SVN\1,
but that is deprecated in favor of compressing the deltas as a whole
as far as I can tell.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/svndiff.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 4d122a5..df0b1a2 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -11,6 +11,7 @@
  *
  * See http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff.
  *
+ * svndiff0 ::= 'SVN\0' window window*;
  * int ::= highdigit* lowdigit;
  * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
  * lowdigit ::= # 7 bit value;
@@ -20,6 +21,23 @@
 #define VLI_DIGIT_MASK	0x7f
 #define VLI_BITS_PER_DIGIT 7
 
+static int read_magic(struct line_buffer *in, off_t *len)
+{
+	static const char magic[] = {'S', 'V', 'N', '\0'};
+	struct strbuf sb = STRBUF_INIT;
+	if (*len < sizeof(magic))
+		return error("Invalid delta: no file type header");
+	buffer_read_binary(&sb, sizeof(magic), in);
+	if (sb.len != sizeof(magic))
+		return error("Invalid delta: no file type header");
+	if (memcmp(sb.buf, magic, sizeof(magic)))
+		return error("Unrecognized file type %.*s",
+			     (int) sizeof(magic), sb.buf);
+	*len -= sizeof(magic);
+	strbuf_release(&sb);
+	return 0;
+}
+
 static int read_int(struct line_buffer *in, uintmax_t *result, off_t *len)
 {
 	off_t sz = *len;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 14/16] compat: helper for detecting unsigned overflow
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (11 preceding siblings ...)
  2010-10-11  2:58         ` [PATCH 13/16] vcs-svn: Learn to check for SVN\0 magic Jonathan Nieder
@ 2010-10-11  2:59         ` Jonathan Nieder
  2010-10-11  3:00         ` [PATCH 15/16] t9010 (svn-fe): Eliminate dependency on svn perl bindings Jonathan Nieder
                           ` (2 subsequent siblings)
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  2:59 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The idiom (a + b < a) works fine for detecting that an unsigned
integer has overflowed, but the more explicit

	unsigned_add_overflows(a, b)

might be easier to read.

Define such a macro, expanding roughly to ((a) < UINT_MAX - (b)).
Because the expansion uses each argument only once outside of sizeof()
expressions, it is safe to use this macro with arguments that have
side-effects.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 git-compat-util.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 2af8d3e..817f045 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -31,6 +31,9 @@
 #define maximum_signed_value_of_type(a) \
     (INTMAX_MAX >> (bitsizeof(intmax_t) - bitsizeof(a)))
 
+#define maximum_unsigned_value_of_type(a) \
+    (UINTMAX_MAX >> (bitsizeof(uintmax_t) - bitsizeof(a)))
+
 /*
  * Signed integer overflow is undefined in C, so here's a helper macro
  * to detect if the sum of two integers will overflow.
@@ -40,6 +43,9 @@
 #define signed_add_overflows(a, b) \
     ((b) > maximum_signed_value_of_type(a) - (a))
 
+#define unsigned_add_overflows(a, b) \
+    ((b) > maximum_unsigned_value_of_type(a) - (a))
+
 #ifdef __GNUC__
 #define TYPEOF(x) (__typeof__(x))
 #else
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 15/16] t9010 (svn-fe): Eliminate dependency on svn perl bindings
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (12 preceding siblings ...)
  2010-10-11  2:59         ` [PATCH 14/16] compat: helper for detecting unsigned overflow Jonathan Nieder
@ 2010-10-11  3:00         ` Jonathan Nieder
  2010-10-11  3:11         ` [PATCH 02/16] vcs-svn: Replace buffer_read_string() memory pool with a strbuf Jonathan Nieder
  2010-10-11  4:01         ` [PATCH/RFC 16'/16] vcs-svn: Add svn delta parser Jonathan Nieder
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  3:00 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

From: Ramkumar Ramachandra <artagnon@gmail.com>

The svn-fe test script only requires git and the svn command-line
tools.  Make these tests easier to read and run by not using the perl
libsvn bindings and instead duplicating only the relevant code from
lib-git-svn.sh.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9010-svn-fe.sh |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index a713dfc..fd851a4 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -2,9 +2,19 @@
 
 test_description='check svn dumpfile importer'
 
-. ./lib-git-svn.sh
+. ./test-lib.sh
 
-test_dump() {
+svnconf=$PWD/svnconf
+export svnconf
+
+svn_cmd () {
+	subcommand=$1 &&
+	shift &&
+	mkdir -p "$svnconf" &&
+	svn "$subcommand" --config-dir "$svnconf" "$@"
+}
+
+test_dump () {
 	label=$1
 	dump=$2
 	test_expect_success "$dump" '
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 02/16] vcs-svn: Replace buffer_read_string() memory pool with a strbuf
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (13 preceding siblings ...)
  2010-10-11  3:00         ` [PATCH 15/16] t9010 (svn-fe): Eliminate dependency on svn perl bindings Jonathan Nieder
@ 2010-10-11  3:11         ` Jonathan Nieder
  2010-10-11  4:01         ` [PATCH/RFC 16'/16] vcs-svn: Add svn delta parser Jonathan Nieder
  15 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  3:11 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The buffer_read_string() function returns a temporary string of
size specified by the caller.  It currently uses an obj_pool to
store the return value, but that is overkill: all we need is a
buffer that can grow between requests to accomodate larger
strings.

Use a strbuf instead.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
[Resent after messing up the message header; sorry for the noise.]

 vcs-svn/line_buffer.c |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index f22c94f..6f32f28 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -5,15 +5,13 @@
 
 #include "git-compat-util.h"
 #include "line_buffer.h"
-#include "obj_pool.h"
+#include "strbuf.h"
 
 #define LINE_BUFFER_LEN 10000
 #define COPY_BUFFER_LEN 4096
 
-/* Create memory pool for char sequence of known length */
-obj_pool_gen(blob, char, 4096)
-
 static char line_buffer[LINE_BUFFER_LEN];
+static struct strbuf blob_buffer = STRBUF_INIT;
 static FILE *infile;
 
 int buffer_init(const char *filename)
@@ -58,11 +56,9 @@ char *buffer_read_line(void)
 
 char *buffer_read_string(uint32_t len)
 {
-	char *s;
-	blob_free(blob_pool.size);
-	s = blob_pointer(blob_alloc(len + 1));
-	s[fread(s, 1, len, infile)] = '\0';
-	return ferror(infile) ? NULL : s;
+	strbuf_reset(&blob_buffer);
+	strbuf_fread(&blob_buffer, len, infile);
+	return ferror(infile) ? NULL : blob_buffer.buf;
 }
 
 void buffer_copy_bytes(uint32_t len)
@@ -94,5 +90,5 @@ void buffer_skip_bytes(uint32_t len)
 
 void buffer_reset(void)
 {
-	blob_reset();
+	strbuf_release(&blob_buffer);
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH/RFC 16'/16] vcs-svn: Add svn delta parser
  2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
                           ` (14 preceding siblings ...)
  2010-10-11  3:11         ` [PATCH 02/16] vcs-svn: Replace buffer_read_string() memory pool with a strbuf Jonathan Nieder
@ 2010-10-11  4:01         ` Jonathan Nieder
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
  15 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-11  4:01 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Implement an svndiff 0 interpreter, for use by the dumpfilev3
importer.  It is slower than it needs to be (e.g., it does not use
fseek() on input) for simplicity.

This is based only on the spec and not Subversion's implementation of
the svndiff0 spec.

The tests come from various deltas encountered in importing the
Apache SVN repo.

The svndiff0 semantics are not completely documented, meaning that
some of this work had to be done by guesswork.  It is not complete.

This version of the patch omits the enormous Xerces.cpp test (which
is not so interesting because it passes, anyway).

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Helped-by: David Barr <david.barr@cordelta.com>
Not-signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
[alternate ending, for those who do not like reading 3-MiB messages -
sorry about that]

Still not signed off because I haven't checked the copyright of the
tests.  I would prefer tiny deltas or deltas of some public-domain
work (e.g., drafts of foundation documents of some country).

This patch owes a great deal to David and Ram, probably more than
it owes to me.  David made lots and lots of fixes.  Ram introduced
the tests and helped with design.  Thomas provided an early sanity
check for code clarity.

Thanks for reading.

 Makefile              |    4 +-
 t/t9010-svn-fe.sh     |   14 ++
 t/t9010/newdata.diff0 |  Bin 0 -> 19392 bytes
 t/t9010/newdata.done  |  522 +++++++++++++++++++++++++++++++++++++++++++++++++
 t/t9010/src.diff0     |  Bin 0 -> 74 bytes
 t/t9010/src.done      |  522 +++++++++++++++++++++++++++++++++++++++++++++++++
 test-svn-fe.c         |   37 +++-
 vcs-svn/svndiff.c     |  265 +++++++++++++++++++++++++
 vcs-svn/svndiff.h     |    9 +
 9 files changed, 1364 insertions(+), 9 deletions(-)
 create mode 100644 t/t9010/blank.done
 create mode 100644 t/t9010/newdata.diff0
 create mode 100644 t/t9010/newdata.done
 create mode 100644 t/t9010/src.diff0
 create mode 100644 t/t9010/src.done
 create mode 100644 vcs-svn/svndiff.h

diff --git a/Makefile b/Makefile
index d99da33..966f5c7 100644
--- a/Makefile
+++ b/Makefile
@@ -1766,7 +1766,7 @@ XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o \
-	vcs-svn/sliding_window.o
+	vcs-svn/sliding_window.o vcs-svn/svndiff.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1893,7 +1893,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
 	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
-	vcs-svn/svndump.h vcs-svn/sliding_window.h
+	vcs-svn/sliding_window.h vcs-svn/svndump.h vcs-svn/svndiff.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index fd851a4..9e61608 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -14,6 +14,18 @@ svn_cmd () {
 	svn "$subcommand" --config-dir "$svnconf" "$@"
 }
 
+test_delta () {
+	delta="$TEST_DIRECTORY/$1"
+	preimage="$TEST_DIRECTORY/$2"
+	expect="$TEST_DIRECTORY/$3"
+	expectation=${4:-success}
+	test_expect_$expectation "$delta" '
+		delta_len=$(wc -c <"$delta") &&
+		test-svn-fe -d "$preimage" "$delta" $delta_len >actual &&
+		test_cmp "$expect" actual
+	'
+}
+
 test_dump () {
 	label=$1
 	dump=$2
@@ -38,5 +50,7 @@ test_dump () {
 }
 
 test_dump simple t9135/svn.dump
+test_delta t9010/newdata.diff0 t9010/blank.done t9010/newdata.done
+test_delta t9010/src.diff0 t9010/newdata.done t9010/src.done failure
 
 test_done
diff --git a/t/t9010/blank.done b/t/t9010/blank.done
new file mode 100644
index 0000000..e69de29
diff --git a/t/t9010/newdata.diff0 b/t/t9010/newdata.diff0
new file mode 100644
index 0000000000000000000000000000000000000000..57813032b74ff90fae130841a2a32ea909d489cc
GIT binary patch
literal 19392
zcmdU%+j87ic82F7SA2>cdMctkY*N;EY>%XgEK&4Wj%2DtjZ_s^F3{a*HcT|o1E5KF
zu9Bz7>*o8{J^+27WKZQLEL-dY_QKlhy#Bqw%lAM2?svcY@BjJ!pZNd3`2X3PXJ_A^
zyR(beuP=VQbZ_2X{BVBu?%?v>vv+^HeB?%-kBVxVS6x0Gb@?wH@7=4c?cAcC7PBIs
zx~y|Q{r*MP<-h-rgM;VU<oc?qSJm{<f#aukFW;WI$vms7ymX&s%WN{wN0WN7sH@i1
zl`ERbNxjUgWmEs0PrCM`$~#->T-Ri?Sut@%>&p7-id%KHd-AupuO7|2Zu#i+^s-(x
zlYI2=b(No7G*_pAIwVj3HPC2l*UjOuo9Av)vx};8^~@E#YO+aJ+~jUrw3Aia7Gj87
zWj9$-X5%uCT+}YV$(uDdIXHOZxrEeWQFPE~-BHFqX71sG2M<SIJQzLv`hMJb*rk8#
zT$VL;bv&uF=6G6HSvg%E|C~2@b=_oD_n*hJf`6{^qFEeYwOw9R$3GW~<F;#x>%0lF
z3=~Ig(dGXaWRIhiGF05O$5k<@%j`O@4i4VtWuCR+E)|=1>%4pH+O8-|H_c~7Ra94Q
zQJCP|qtX2Xt!eA>20guO+g08^ItXl^1$9fhZke%XQ$mxkS_v9yPIH$Gz`dL2ZT~*7
zxX9LSoV!&E`SIG-D6~myqyTC7+2wb~ZoKMT!GEo*>ds|heUtxvRW$j6d!IzUxRHyz
z-D+Rlpv<bPRd$t+va70Qx07^r(0%I-ie0$FqH^s7>pgUOYf_Ozuyfgdxbw)J|588+
zaiPB@qO7`}$g!&%cU!MYwv``if3sC7Ltm`Qu2?2BbZN7md*?>(<qUZynr>R>e$SYa
ziR0+q^B4EsRh^Y)qR-6v;*O*4@w0i3A0~<6CT)Q^u#>?Ce_rA9<9w3Ap)Q_d_EjCW
zh@YE2=Jg^!aWAXz_O@=O?XlO)#=Vnr+_KCj(bJ$ZL;l>SK?qme#%p!Y4i{NBnGXmj
zK7u;$`CSLyNmg&8v6aQR$(kZ>)0&^0ceqFHHh1mZbdmqE#3#4|X5_8CBS5$r%95c4
zyH6w>jzI*Jbv<_ZFWCacwbIxPlAJ;p^Jw!1eL=LzuORP;ma-Bt2iF^RF|EyZrN;?w
zSe0u<teVx)^1FIDDv3*FVj=jCw|{tCV-GG`Lb8D%pJdeUYcou<{#jj?^{wsiknL2{
zteLuCG&labs}=hxiz;sqBlkbaYX}^<j26Y=mc}V&Fj<YW4v9_ciNjs<xaHbC!`aO1
z6$vHT*8RjPiPn60n#*C3oCLmUxBn4#oHdQ2_f3)6vUE3ms6zPiuzbLS>Bl}?yGd3B
zBCu{2I3)`-q@O70Z80Se4qwRYVgi7gwBK~D%5QsjY~5|qVN87pn&uq|OVZh3?IZ-e
zt<?+fa4uWQjPhyZcx|iSygoBqva1$({wo|Fogci(tEmDJVcP<<%&(UD>BH61SSF^l
z(8CwU>oU7-lhFqYi_!DEB6w@v+k9EBM{ElR(aye=aHP_buovCzXt|ht3w;#-IdS=O
zMLNtStX!<BD@b|c<(p<>ZH&zbFmGTLyC4PFWsB_Rx+%K#WA2Qrk{1_}SlRnGd>J3M
zjoWz<;>I+e6w3J_kabFLSTd1tkxldaLE3-#;!()kD3-*HDf`Ure5Q682@^Bqn?**b
z8)s{r=Ge79p#~}=s91Urf#%o^6F<gdIi638MHKEk<XB~k+}#Tnch4BL@H=%aaTbrl
z=Eke0@>)ak;dVb{*L-7{0l6Za6t_c~3>S<7>6lqg!um@e0ttp>YVm*wvzxk@B7<sH
z7F20wd)IlsG=(p7g80+0`Yc`JvszJ2$`$$j2MR0OQcEhxNs>fGj3J8#vYZ!X-PZh?
zWcqk%)nFhhJ;av}9+@uOQE_i!28H27eU12XL$*m@NywqNGKB<^(ofWweMHqdQ!?=T
zCxg}QmMZe8b01VpT3$PqXp|h<NmCF<aP2^DxoVbmn*=gqd0md{U&7bU{Q)jL8}8`k
zJtcnkhcB#diTOo>UYU6+h?tP`P(&mu^4<6oLgYJtqo`2tPM<e<R+j6}af54giJR;t
zq3t9w8p~AbZLGz`^vRzh+kt@JSx>ItSGt2l8+71~BnM1KrBX2?QK$|?w1kV0{rda&
zpCwW~b!u)mCO4M#K=W=zX_^@GS$@5DvFfDiUT4YCY|v!9%$xFbxh7;+?vUnYMG`se
z8ERL=7I)O%5^JCKACL30vlf6-kYqQq_bL+Ds=etiT}-alM2f`XOkGk#LT@N1%E&r`
z?yb=sQTAdTP;>WuxAR@yT;+ND_f>u`lBR;dppuj^-6rQMN+9gBo70V5&1KChq30Pn
zbEmY7W@NvOFRPHm+AOCO3@KXH*=z0)Bc0^SZYZVx>a@*vE`_twUc@vuBI>PXvtB<Z
zfAlfZJ&M~Jxu-E2UUqBRgy<X#8Z_k{3+o%D_62PUmJ_1h@Va9kBo$TYS|@IdQ(Tjr
zL#eKfOYS~OZ$yv~7>+II^hL8)B$1anA<@?WR_Cb7%Wu>R#m#u{AxESgOJ~*+nqN4A
zdAW{Phl}3TOACg>jchR-qZw^TnU9G<6-`8+l!oo2INp2HjdT?C_TU6@SWXK97_b>M
zQMC6{LhQH2GNsD!LHPA1Z$`1Rg6fX`?&NKoosIjX&HHS7yjOiOxf)Xgc2b1d<VT4q
zQ^}vvf4QS%C%$x<U^ZRMvtpX8OC4it=t(F~R5wk~LyI1KNdi3-;|sJvH|iOy_}Zl0
zZ|fsgF?w+F=RQ&C?)p)Lje^)`z5zh6J5YGLYPA@9Hic@qo2(?U`+Cqv2P=Zc7BxE6
zgubc^9LncT+y{!2oGOgqJu9wO@^3ac56#0_DF5CButeosSUx2cc~7Kdn1{Aqq9S~%
zwzEEUs}M%HVYR;jsZ~xM8zw6b2}NB7jU6O`noukU0=%o$OfBfCEcI9kL*Y-ODrU%&
z(r7A=!!m?Skt;mJ;EH9DPX_H{{El)#Ib*OUG*<=@3_gBgtwxRTanIGHll&Ah$iNC~
z?B#~csHSMJ=nS}xFIz>z!N<>61;|0W1~FP_Wy@ysYO;1mL|jU*{*!2P+RTqCU91cX
zia+A*iMu^@KU1X*uY48Y5O>p_xbIdKkeMpr7?@uKKT*&nAunmkijwx$a~}A#{#Fga
z=$7s=F;BcruZKpQDzy+^!$pahVoB0}qM+y<-C&!-z5r*A^HI_vW8~#*G?`;`mCLTK
z=-n+D?!E6O!~PC-VeS43$hRuVs#N0`+S&U{H9s{#9_<ThTDdK43Q*BPNB)bI&e04c
zWJyuzpVv{~;%)&akp$L$lO(_xipg-J>9x{G6!xbm$GSc7dZ+$+@&jG&CEo+FuQ6aO
z5~$HCa~xd$8#>i{Z3Jql+%IUo4FnKKBOMqfavG4F(;Mps4+8vyqKGoHrd}AmtfKA%
z>m}Zaya!6;j&fTMeNm7BS)iUr2S@aA!f&klqFwO#k(`&d4aI8ROsT?XfkM-7=Y&3s
z$72BWTY5;8F5<*t;Mm7O+C!?h`90zWfQv_BMQcpnosI+-2z5FT?A~x6pjjn~+J(k0
z*pZL+0x!4`?MZKc+NZ!@0?4cZj|)L6=!ce*^8lrVc0s^o9~O1TENEjBC~w?qm^X^5
zXgsO50AU`DIcPvQG5o0BS3--1!`tB4@e^($QR6JR6Ci^2eX8}yGF3*mJ~kvRZ1MiE
zehbBy8apG?X2tZe3H_>;Mif3hfRzMjHrB0|Xx7qvpM(N{>(Ls*AXP%DlA<|2jjnLi
z<EnwQg0h-7nJ?_j^KyBL9v2u9Di%y`n~CCe?)&vpb_YJ4mDG_{hiB9CsO>lwgf@~o
zfTCAjO%52;*^-$$uW;&++&INdUqgIMF))iRf@8X)r3d|h;cNp^gs;@IE~?T(;jr4p
ztTSK|XwZkzGttNyLO2k|3!$6!(!3QbT;z1!*U7+kBtRWgM^vw`<%6YcajZH5aFmuP
zAxg*{>$w8T0Zz#SAdd!UoD&T>QToXy>U|70=Qs&WKzMk5n_5G^WpC%+LdmSK*n64a
zBB~}GQpkXB=Tz`~!a-NFVW+~D13<K2E1tn!6d6q#<*j3YGoMC|ga31o*QBI;$U&P-
z>X2d%;3qHJ{MI@e)n_o74p<D)ZgL&jc{EOT8zrT+y{Qf}b1<5Cp>w@cQ?Za?!Dfj&
zpnLvzMm<9von0$vQhT=qShFsFYvOHBDyfDSAc#f1Y9pMBvS_v`T%V0e0qn&hJD4^H
zN>p00IvrBrH3@o7_{*nT>X3)HBa0hBGlbBBtD=S5S9X<so>)|sv*COn)2I9H;?)cH
z{QT9$2lxKs-T9?^`Qy6__w4z_`}3!OUniWj0L4=>D%MCjhmx*x(4iqzgHSqhFTz(L
zz#tCFMj41q&_IFA48hgo8%a)X&B0c4-y4GArfDBK`^iewF<m31U;{<Bsq(GxA3Z^N
z(Q~^fuI2{999ro=h!qbu3&RbO=}<I6vdJ$bYoFwByZ*QhYaO_oJj}Na5xFPu6hs-=
zA?6~%-qi(JfHCFP(Me1|HJ0c+u&vB2iN{Q8XgsxHp8y@mo=&d_DqQ4y6#@%n2c4A<
z89);;;H02((o5r0sWb^8=2R0#G-We6<w#u0xbek^ti<UQU6`&vdsGV+_S4@yk~s?M
zYqNDr8YVfK!4<PZo#%4q6}O;x9(G~m_mEP`_H@kmv@{6XcvB?w>ad*dLF78(36UXw
z`)?FJ{SDP`6TNMhzMv9pc3ZUb^@!*^M*y6vCPXGs+T590*ScP*v>KQ7#Ny5bT-thE
zPyBK615ZNP#Ph;J{DFeZi^rme0+%AF4k%PoXaZrIfXqUbD=@pyD#>IBcCiNaW#*X=
z-dhy#RU4iV%$8mosyR4g4_3Gp%u>wiTlxs`O{|JBECVDfuwra%-QLR~9jp>pnwEk@
zh%yoQ*_*e6wEl+rmgDsnol7)I>Ld`X5d#e<_S9#74J7y;jI2QO-J5{MLQ0s!h^2I_
zg)=w|kKNyfih{USP0M^r;eT{~a&<z?=u?F^e2?+=2D;;DX_vckJ0|3z&ryM;*c6wZ
zy}GzOckjM`c}d8|-4A81%d91-9O5)$T}3OsNX;oR_$)TylaARj)kL5)ip4wL62mox
z|6;WOb~exny8=<z>?1cKa^z`Py&Z@8um7%Z%0GSQa7;KDt&mz_VF=J9(u%2#pDLMd
zFcwS6$PB1=@k_2#G~$ek!&_2o2y=<rmQF%GMwsaq1KAOZ*XqV_urX(;az)C#a2ALe
z0^a&FFP#Sy&s5!$SS8+c!{IIiFfF_1Y<r~|)vWOmcegPUl+uDSbFb#6;8a)%!_F(Y
zJht<`O5LuA`qtVV37A?yeCp=p%9372*&49fVu`*BGf1h7P275y^#4XPYkb1ROB9My
zjM)kZDW43|ZX%1+qeVKf!LFXF__)o-Hj$KeA3i{NT%=13DlB<n!yPtRlxPlF^{=%{
z9ahj#1IUrJg)HU1E(z+D+}n6yxgp7DMSDz!V)N6R;=q)8goEoxRHWD9LQOx0Wk6#^
zY-%|CCFD?w=2c6nBz8|uwo>+vUa9?J4WvU^e~*FPC1uO<MUqglS4InrNg{phUQ&g@
z9SH{nS2c>SkPW0%Ai#pV0<dmZ!q{!!UeReX?o=-4S#+Mznm|*-4GSj1_R`S)eVcm-
z59%6HZyeOb3Bkt7y~dTUK(qpLlou$|ZT^HllTuzYWlDtUO6@2N0cOkQB(v$F6Aryj
zdm1P6Zn3%du;OQbK0aI}UqYXEu2yac8r`w4yD_!IZZukQGsrQPb;qVY&0v)4_U7(h
z3A_xjlAkU(`KwNcY7evo7>TPpNKQb0hw>$y>7&V2aT)|U0)9As>m&Qi3KfHCqsI0r
zKKqw1PX2uI@Z^hs!Ch4PjAS8y(IDyRO*K7HaD>`uku}$=Ws=uG-03p~{OCX=!4R<F
zU0|0x94oXeZKjjW9GVA#PEnibq#;omx<ZP8?LDh}Log*vghG@sSu`Hi|4nv0&GD#-
zWdxODUgbxE)?GY+9N+}WlK0r<-O3x>V^eVh*+o;O7YQ}HiVfz{LW)xwIP4wfaIch-
z#KKNgNok_ZLPHe4r2nba8n;Ch0q2QimG1^Y*&R`GPmxot36`F>6cQczHNxMbv5}F0
z)lV;1<FJfYE2}2LTl;?U9}~7GlyAAnK+<XN{|~J6;0y&8mbl-9z9CKOQmX@++oU3@
zRZl^+2{KL7t`yU97-C(NiYS{%JoACmXW}4*RIAFqtTj$p8MJ6mZRAj2hSdheZ<d5x
zh4m1zOgUs<iVdJ{1%kV>NiQW672eyoI|TnCI4Hl7=G5W?U<^ve4=NEukm1V(J04Vh
z`qnA|zS3CD;8flhBa#1r@NDFW-O7YDLqwHfpu*;lG$Yq0d-guSH%tbxE1e-FyYueU
z$3?(IooI5c@GVu(;h>QSpwTx?rLKS_`u4{ImO+00d;)UUmAYGcXD_5SNyxdsYW$M&
zCsQR|rg?O6BF_*E;j3~iO46UNnr8hpqGSWkhb6B+)v!a_E-S75_K-dyL8lFhWB#ai
zK)`Qarw~HDAB$Jt2dYj3)}V#8<_bGd+yalGhPEzXmq<7=>2(w?88P~$K0TO(nG*`|
zM<0xh*Tj3;&2V+1%gwB?FqfhiwPSB_U}N9`R!YOXLv45|CWO$?Z`el!G>tLy2J)Nh
zBn~hAH&$t@a-ln>sWSYbnc4iEYDNhnH80!7Br^V#Lp%8AJ&N;YJBZ$C+ZfaCxR?lf
zD+`{h@g{sn`V!_)Of+&q`t?17oyJq6So<)b?8&5mL!9#Cwiu&)10Y?4TFZg8Mnxpq
zx2I7jGOPaMs1ud^3<Yo2ugLR8z9;pgHK!+6tmpRVj83u4A~c7JuOuc1BY(-V((DXi
zt~`W)Y{;GxEy5lH0zIjHuOeha*8u1et#IF!o1YEQFLy|fe0YDzwX&HkDIu==be)(?
zA_i6E5r-sSJ4epSk)8`$wo$%ufPHvB#f=y__ua}{&jv@^NnnQ?#2Gj_+N<b8E=Y5J
zlFpEcWT-wK3&H?>v78bOE!}WavgsYp!-z2IozPt^>DFk>0weOP;r9PAagXb|Bfe+L
zxJ|Qbtcfyc?dQtWPF*o&L$l}y3U4<Xx_~GhCQ|l5OBRQ!<p^<J)9}bNY`?L2JWo#n
z;2LpOby!x1XBup7=n!`8*!oj~AJG4(aME?r2A@71Wb63h#7pR?&8dJ1D>k*1JYSfN
za!Zw-0ZLSpGQ$g1!0c1Q<K)21%?MJ19BYRK6_aA~p?3F`i{7AzrXKF8bOX)ds9a1v
zL`&5LvLl6x!Q*Su?-SowR4GkXej5ZZ=;I!di}X;J`DC)mp4K+Vy)A+GUK~goihTB3
z)zEO)AI$@)l(Vv9qex17NG~A&t2VMJlwY%vC?*^ffrk^uu%qE#hLOPsp^uh#DZr;K
zq-Vk$h_(eu32oo521~*)5DpEz!+ErOck$-s*^kd&pW9-pFV!V41(8vo+kxzGwL1*A
zIy5z>#lgAb+=_5$C}Gut2e)R#ZR#u>a)9aCBPo8!;V$t6%?=n*m6ybo;;Iiv;dFVn
zvROERUpl(+Be9za*bSS-A$2mxaG-<Tq<uei8nUNO7=w)2Q-r?2eo1wk-wudGqvm%w
zgTe(lI!a{(lXG-JIsoHQF8>6az)w;^=cB8{r?VV82HTpEOgFL<3k^(qK$Z$ykLEn%
zP#}&YzxeSI2i^SH^<g7u|GeIqsP*fjQQxs{lemA!L%JN3Px3tmnV8>Rr^kAJ!z}=N
ze#_-|-u5TQ%q9;K?<J?2-{uFl(%dHRewLT$9oVOGLwD+`{K-&!TJ1wn@K7TGZrk1X
zX|I&GgZaL=HY|vY9I05;@awETc6m1$dQ-f{|BgJ$()h$io}%Spo(F2PU+0?ar4=<E
zp6PL8_goKNbD;6Woa5Ub$ny_K{~I~%jHxmAM91|zU%$}<S2jC9H~D0#cRSboz;kNI
z`#jAQxhI2swsZM&W}_<ifnieW%4Nnu+T_|jQ8eAT^al=#f5@&1rcka6s1}QzOa3Z=
z?lq6<@&Grx{h>K=PwdI8-7A^gbeDRrnk(%2>Tk<+lL<5SqPQfDvDr(1wCA3L2)04U
z1nWaUk{ppnCC}MBE1FG396P!Lo7#P@!93@m<YQjTD~JSR4^{hjKUvhZV-KFP)$mO{
z=4om~?oW<A32qj1v~UgUYE=lpL2wo0LrqkApfM}stf7o#;f-WoEFTAKfVGu@Ap$gF
zw53nBN^P*j=`j#{;@@ToIGWS0JRE4<3q{@J+N80N?;M<^1FEeZyPy2YP8y61Lzr6h
zvbw><rv?pqKz3`f2!kZhVZ`Ci(=g{~WBB1|NgXFkEyiP@aPJ)-gfnLGM)hs*lwvlQ
zBU>^kjd@$>q4XeDXJG<3+vKZKM<oJKHqatEno(2FItDiFY|~m*WS>sm`!jJ?8U<AA
zdw-p=#DX)?;Nd!*sn~Iu*#Az?2pN!ka_`_k{A;~OFTl`se_aOOb;<KjxPT=W4Ws!}
zW_A^JPt^GE{rTJTXO}i8;k-dXAdaI87}6Lj^(9*%&EqN}KE!slb8g1dF5%8SgH{Hn
zd2{k+cyKM)<zOq|LE#ZO-3V%Wj<lU)&H<0aqbvFqAro<Rv?T-pLZiP05$Pud83UWj
ziL2G7k6K_LRF-5Np@om@3$YDVW>q&c_B`_E<xPbE*yx$kb|kUoiVej-8~}kIFFqVI
z1c30ryLfwk%uttm`OdvPzr1+${`^H4Vfb~sKf)nFV!VO`&M`CR1JL2FWrJ~!V_Y!l
z^p>8ajwDe=y9y6PJok_M?r1la(eeP<2AJupVY9?McBr4N{YVhIENQJsBu?V$kGb9+
z%_cSW>pf6(r#3g#xZ`<|%-s&?M}vR$*1=x&Ogw@-e6JU8*dVB@26Z>3l1^Q=CF2cS
zr@>}ZWOm$JxUS#_L$|{{y7Xrls7uw4;=l;sCjiBSpv^rVA8ZHnC3$%@#;Z5ERG<u@
zzV=-&^*^U}8UPG2(<YQ75%~!R)xnD+%?+$Bgco}{h701#O?q+VE;qkXAIvh^@Ti^!
z!sZL+DyEumvW7s%gkJoSi(sEN2e@3;ihCB6gM_`yh4b~8Qv<J&#4gKyBo3n9sL|rO
z@Leh}j!Zk-rQxtN+giC(k!^jyuLRq=WTW)j`fj)8+P)x_T3cUimRKop><^FM{q3zn
z=5|;~V%e)NayV61uP*sP?}_^zF*2zXL-M!a>eY;eGLXl!VV3JH;V5=-B4a0GKHA6S
zzslWMEtWQ-Y$Py9rT)-{hvmBYu^g<p@`Js4(}llcqt~2D@jP8W97%2X_Z1KH5qhj>
zus<&2i2;nSIEJH4W-eqPrGXnoMt((LkCwk{<97!5E^9UEx0|+!PB6zw4KKo9oUj5*
zM^t@i<DoY4H}S+{?!YGL=>X{j=uyKjEYab1F)3&*6#9ISi96e7*B^1xeSdcFnR~?%
zPU+tB=#K_yD4RZ0BS^Qu@%OLzlzK%^vUT%E?n~bKuSa~!9D^;Wm?-y`wRW^a$YGf1
zOZS!g#(nY7eeo54Ey_Lo(mnjf{qdpu<5%v>2ky%+-Iw3EKRtAR`pSLvz<tHl-?*<I
Sy02OE;6CKe^(VDHJNREnOC7xc

literal 0
HcmV?d00001

diff --git a/t/t9010/newdata.done b/t/t9010/newdata.done
new file mode 100644
index 0000000..f53c3bf
--- /dev/null
+++ b/t/t9010/newdata.done
@@ -0,0 +1,522 @@
+APACHE COMMONS PROJECT
+STATUS: -*-indented-text-*- Last modified at [$Date$]
+
+Background:
+    o IRC channel #apache-commons on irc.openprojects.net
+      traffic is logged to <URL:http://Source-Zone.Org/apache-irc/>
+      so that the content of interactive discussions is available
+      to everyone
+
+Project committers (as of 2002-10-27):
+    o commons:
+      aaron,coar,donaldp,jerenkrantz,fitz,geirm,gstein,jim,striker
+    o commons-site:
+      aaron,coar,donaldp,jerenkrantz,fitz,geirm,gstein,jim,striker,
+      sanders,nicolaken
+
+Release:
+    none yet; still defining mission :-)
+
+
+Resolved Issues:
+
+    o Commons is a parent of reusable code projects. These projects
+      may be used by other projects of the ASF, but it is not a
+      requirement.
+
+    o The Commons will be language-agnostic.
+
+    o Projects that are "in scope" are defined as:
+    
+      - Existing components that are, or would be, useful to multiple
+        projects
+
+      - If a component does not fit the (TBD) goals of Apache Commons,
+        then it is not considered "in scope" just because it has no
+        other home. In other words, the Apache Commons is not a place
+        of last refuge if the component does not match the Apache
+        Commons' goals.
+
+      - Reusable libraries
+        [ gstein: we should expand this definition for the mission
+          statement; examples provided were serf and regexp ]
+
+      - Components that do not fit cleanly into any other top-level
+        project, but they do fit the goals of Commons.
+
+    o Voting will follow the "standard Apache voting guidelines"
+
+      [ be nice to refer to an Incubator doc here ]
+
+    o All code donations [to the ASF, destined for Apache Commons]
+      arrive via the Incubator, unless the Incubator states they can
+      be placed directly into Commons.
+
+    o Existing Commons committers can start new components without a
+      detour to the Incubator. These new components must be approved
+      by the PMC and must meet the (TBD) goals of Apache Commons.
+
+
+Pending issues:
+    o Coming up with a set of bylaws for the project
+
+    o Enabling Reply-to on the @commons lists
+      (pmc@ will *not* use reply-to munging, but user lists
+      will be determined by user majority; this item applies
+      to lists for which the decision has not yet been made)
+      +1: aaron, coar, donaldp, geirm, acoliver, mas, bayard, sanders
+      -1: fitz, gstein, jerenkrantz, striker, jim
+
+    o The name 'Commons' has caused some heartburn with the
+      Jakarta community because of the Jakarta-Commons project.
+      Should we rename to avoid conflicts and keep the peace?
+      Conflicts would include Java namespace as well as
+      philosophical aspects.
+      +1: 
+      +0: coar (i'm willing)
+      -0: jerenkrantz, donaldp, striker, gstein, fitz
+      -1: sanders
+
+    o If we rename, to what?  What words/names describe our
+      purpose?
+      - toolbox
+        +0: gstein (I'd be +1 but for the confusion with the existing
+                    Apache Toolbox project, but *really* like this
+                    name)
+      - toolchest
+        +0.5: gstein
+      - tools
+        +0: gstein
+        -1: donaldp (tools are different to components)
+      - components
+        +0: gstein (a bit long)
+      - util
+      - library
+        +0: gstein (doesn't fit well with perl/python "modules")
+      - suite (sweet?)
+      - belt (as in bat-belt or tool-belt)
+      - mcgyver
+      - foundry or mill
+        +1: sanders (maybe too 'SourceForgeesque')
+        -0: donaldp (If reorg goes through we may have multiple
+                     foundaries or federations for different "concepts")
+      - federation
+      - share or shared
+      - stuff
+        +.3: fitz :)
+      - ?
+
+    o Style for the mailing lists:
+    
+      One community mailing list, with specific breakouts:
+        +1: fitz, jerenkrantz, sanders, coar,
+            donaldp (lets start here and evolve)
+      +0.5: mas
+        -0: aaron (too early)
+      
+      Topical mailing lists:
+        +1: gstein, scolebourne, acoliver, striker
+        -0: aaron (too early), jerenkrantz
+      -0.1: mas, sanders (too early for this), donaldp
+        -1: coar
+      
+      Per-language mailing lists:
+        -0: aaron (too early)
+      -0.1: mas
+        -1: gstein, sanders, fitz, jerenkrantz, striker, coar
+
+      Per-component mailing lists as a default (breakouts will create
+          these as a matter of course, this is about the default)
+      +0.7: mas
+        -0: aaron (too early)
+      -0.9: sanders
+        -1: gstein, fitz, jerenkrantz, striker
+
+    o A number of very valid issues have been brought up on the
+      list. We need to figure out how the Commons Project will
+      deal with each of these, in terms of new components and
+      how those components will contain code projects. This list
+      is only meant to keep record of all the issues:
+
+        - Releasable pieces
+        - Release rules
+        - Voting scope
+        - Directory structure and naming conventions
+        - Coding style
+        - Build system consistency (or inconsistency)
+        - Namespace issues (esp. w/ java)
+        - Language vs. Functional
+
+    o Default commit privileges
+    
+      - Commons-wide
+        +1:
+        -1: gstein, striker, donaldp
+      
+      - Per-component
+        +1: gstein, striker, donaldp, jerenkrantz
+        -1:
+      
+      - Per-component with self-chosen aggregation
+        +1: gstein, donaldp
+        -1:
+
+    o Granularity of CVS repositories for components (this excludes
+      commons-site)
+    
+      - Commons-wide
+        +1: gstein, donaldp, jerenkrantz
+        -1:
+      
+      - Per-topic
+        +1:
+        -0: gstein, donaldp, jerenkrantz
+        -1: 
+      
+      - Per-component
+        +1:
+        -1: gstein, donaldp, jerenkrantz
+
+
+Project Mission:
+
+What is the project's mission?  Our statement of goals/mission/vision
+should arise from the answers to the following and other questions:
+(jim notes that defining something after the fact seems very backwards
+ and broken; gstein notes that we're refining the board-provided
+ charter)
+
+    o Should commons have an sandbox component to ease infrastructure
+      burden on smaller code bases?
+      +1: coar, donaldp, jerenkrantz, gstein, sanders (non-binding)
+      +0: fitz
+      -0: striker
+      -1: jim (the PMC is about reusability, not sandbox),
+          aaron (what jim said; and go see incubator)
+
+    o What types of components would be appropriate for this project? 
+      ("in scope")
+
+      - Tools that help/promote reusability?
+        Hypothetical: ant, jlibtool, ASF-based autoconf
+        +1: jerenkrantz, gstein, striker, fitz, sanders (non-binding)
+        -0: donaldp (prefer a tools PMC for that)
+        -1: aaron (too broad, don't belong here)
+
+      - Development frameworks?
+        Hypothetical: avalon
+        +1: fitz
+        -0: donaldp (how do we determine this given we would prolly
+                     accept it if it was new?)
+        -1: gstein (the avalon components, but not the whole bugger),
+            striker, sanders (non-binding)
+
+     - Components that fit the (TBD) goals of Commons, have a more
+       "logical" home elsewhere in the ASF, but were rejected by that
+       home?
+        +1: gstein, donaldp
+         0: striker (on a case by case basis, taking reasons for rejection
+                     seriously into account. Abstain from vote until
+                     rephrased),
+            fitz (what striker said), aaron (what fitz said)
+        -1: jerenkrantz, sanders (non-binding)
+
+      FOLD BELOW VOTES INTO ABOVE? (i.e. eliminate the "donation" wording)
+      - Donations that could fit but have a more obvious (proper) home which
+        has already rejected it?
+        +1: coar, donaldp, gstein (note the "might fit" term)
+        -0:
+        -1: jerenkrantz, jim, aaron, striker, fitz
+
+      - Existing ASF components whose committers believe that they
+        are a better fit under commons and the commons PMC agrees?
+        (If this component were brought up as new, we would accept it.)
+        +1: coar, donaldp, jerenkrantz, striker, gstein, fitz
+        -1: jim (by this definition httpd could be in commons)
+                (gstein says: see the "if" part; we wouldn't accept httpd)
+                (jim says: until we better define what the PMC would or
+                 would not accept, then this seems too wishy-washy to me)
+             (gstein says: jim, you're blocking closure on this;
+              how would you refine the phrasing here; the intent
+              here is to accept components from the other Commons
+              projects or projects with reusable component),
+             aaron (we need to differenciate ourselves from other
+                    libraries first, namely APR)
+
+      - Packages being worked on by Apache developers, with a clear
+        affiliation, that can't or won't be bundled?  (E.g., an
+        httpd module)
+        +1: coar, donaldp
+        -1: jerenkrantz, striker, gstein, fitz, jim, aaron
+       CLOSE THIS? (as "not passed"; what is a good way to phrase this?)
+
+      - Should we have a minimum bar of entry for components?
+        +1:
+        -0: donaldp, gstein
+        -1:
+       
+      - Should we have a minimum set of requirements before components
+        are released?
+        +1: donaldp, gstein (mixed, see below), striker
+        -1: jerenkrantz (what is released?)
+
+      - If yes to above then which things should be part of minimum
+        requirements?
+
+        documentation: require basic overview and user docs
+        +1: donaldp
+        -0: gstein (recommend highly, but let the committers determine
+                    what is right for the component),
+            striker, jerenkrantz
+        -1:
+
+        uptodate website: require website be updated to latest release
+                          but may still host previous release docs.
+        +1: donaldp, gstein, striker
+        -0: jerenkrantz
+        -1:
+
+        unit tests: (okay so this will never get consensus but ...)
+        +1: donaldp
+        -1: gstein (unit tests should be recommended, but not
+                    mandated; I also find it unreasonable for initial
+                    development/pre-alpha releases, but it can make
+                    sense for "final" types of releases),
+            striker, jerenkrantz
+
+        versioning standard: derived from
+            http://apr.apache.org/versioning.html
+            http://jakarta.apache.org/commons/versioning.html
+        +1: donaldp, gstein, striker, jerenkrantz
+        -1:
+
+        release process: derived from
+          http://jakarta.apache.org/commons/releases.html
+          http://jakarta.apache.org/turbine/maven/development/release-process.html
+          http://cvs.apache.org/viewcvs.cgi/jakarta-ant/ReleaseInstructions?rev=1.9.2.1&content-type=text/vnd.viewcvs-markup
+        +1: donaldp
+        -1: gstein (we should provide "best practices" but allow each
+                    components' committers to define their rules),
+            striker, jerenkrantz
+
+        deprecation process: (java specific?)
+          http://jakarta.apache.org/turbine/maven/development/deprecation.html
+        +1: donaldp, gstein (I see this as part of the "versioning"
+                             process, and we can provide best
+                             practices here)
+        -0: jerenkrantz (kinda sorta versioning, but not quite)
+        -1:
+
+        CVS/Subversion branching:
+          http://jakarta.apache.org/turbine/maven/development/branches.html
+        +1: donaldp
+        -1: gstein (we should provide "best practices" but allow each
+                    components' committers to define their rules),
+            striker, jerenkrantz
+
+
+Candidate Projects:
+
+    o APR's serf project has voted itself to move into Commons.
+    
+      - Should the PMC accept it as fitting the Commons goal?
+        +1: gstein, fitz, jerenkrantz, striker, donaldp
+        -1: aaron (no such thing as "the Commons goal", how can it fit it?)
+
+      - When should it move?
+
+        Whenever it likes:
+          +1: gstein, sanders, jerenkrantz, striker
+          +0: donaldp (+1 if we use subversion, but if using CVS 
+              we should hold off until structure is decided upon)
+          -1: aaron (after we know why it fits)
+
+        Give us a while:
+          +1: fitz (what's the hurry?), aaron
+          -0: gstein (we're only talking about a small seed of a
+                codebase; it won't get in our way as we complete the
+                charter), striker
+
+      - Where should the CVS code be located?
+      
+        commons/serf   (each component under top-level)
+            +1: sanders (works well at jakarta-commons)
+                fitz (Please don't mix interface and implementation 
+                  of commons!), aaron
+            +0: jerenkrantz
+          -0.5: gstein
+            -1: donaldp (makes it difficult to update all related 
+                         projects with a single sweep)
+
+        commons/components/serf   (all components under this dir,
+            leaving the top open for other non-code items)
+          +0: gstein, striker, donaldp (is this just dev with a 
+                                        different name?
+                                        (gstein says "yes"))
+          -1: fitz, aaron, jerenkrantz
+        
+        commons/clients/serf   (topical-groups under top-level)
+          +1: gstein, jerenkrantz
+          -1: fitz, aaron, donaldp
+        
+        commons/dev/serf  (all components under "dev")
+          +1: gstein, donaldp (if we are having a single 
+                               monolithic repo for all commons)
+          -1: fitz, aaron, jerenkrantz
+        
+        commons/bootstrap/serf  (serf is very early stage, so maybe we
+            have a "bootstrap" area; this is different from Incubator
+            since the existing committers do not need "training")
+          +1: gstein, donaldp
+          -1: fitz, aaron, jerenkrantz
+
+        commons/???
+
+        commons/c/serf (separate out component based on language
+                        and then have a flat structure underneath)
+          +1: donaldp
+          -1: jerenkrantz
+
+      - What mailing list should it use for dev discussions?
+      
+        general@commons.apache.org:  (one group for all discussion;
+                                      dev and non-dev alike)
+          -0.5: gstein
+            -1: striker, aaron, jerenkrantz
+        
+        dev@commons.apache.org:  (one group for dev discussion;
+                                  general@ remains for non-dev)
+          +1: gstein, fitz, sanders, jerenkrantz, striker, donaldp
+          
+        clients-dev@commons.apache.org:
+           (this is really TOPICNAME-dev@ where I preselected
+            "clients" for TOPICNAME; this question is whether this
+            style would be appropriate)
+          +1: gstein, striker
+          -0: sanders, donaldp (maybe in the future but too early),
+              jerenkrantz
+          -1: aaron (what is "clients"? I'd probably be +1 if I knew
+                     what that was)
+
+      - Note: serf has no web site, so there isn't a need to figure
+        that out right now.
+
+
+Assets:
+    DNS:                commons.apache.org
+    
+    Mailing lists:      general@commons.apache.org
+                        announce@commons.apache.org
+                        pmc@commons.apache.org
+                        cvs@commons.apache.org
+                        
+                        [ core-cvs@commons.apache.org in case we
+                          create a commons-core CVS module ]
+
+    Web site:           http://commons.apache.org/
+    
+    Repositories:       commons        (code, info, etc)
+                        commons-site   (the web site)
+
+
+PMC Members:
+
+    Aaron Bannert <aaron@apache.org>
+    Ken Coar <coar@apache.org>
+    Peter Donald <peter@apache.org>
+    Justin Erenkrantz <jerenkrantz@apache.org>
+    Brian W. Fitzpatrick <fitz@apache.org>
+    Jim Jagielski <jim@apache.org>
+    Geir Magnusson Jr. <geirm@apache.org>
+    Greg Stein <gstein@lyra.org>
+    Sander Striker <striker@apache.org>
+
+    Note: Ken Coar is the Chair
+
+
+PMC Members, pending Board approval:
+
+    none yet
+
+    [ this may become obsolete; the Board is discussing a way for the
+      Chair to directly alter the PMC membership; until then, however,
+      we need PMC members ratified by the board, and this tracks them ]
+
+
+Committers:
+
+    none yet [still defining mission]
+
+
+Invited Committers:
+
+    none yet
+
+
+Current mission/charter as approved by the board:
+
+    'The Apache Commons PMC hereby is responsible for the creation
+    and maintenance of software related to reusable libraries and
+    components, based on software licensed to the Foundation.'
+
+The complete text of the resolution that was passed is:
+
+       WHEREAS, the Board of Directors deems it to be in the best
+       interests of the Foundation and consistent with the
+       Foundation's purpose to establish a Project Management
+       Committee charged with the creation and maintenance of
+       open-source software related to reusable libraries and
+       components, for distribution at no charge to the public.
+
+       NOW, THEREFORE, BE IT RESOLVED, that a Project Management
+       Committee (PMC), to be known as the "Apache Commons PMC", be
+       and hereby is established pursuant to Bylaws of the Foundation;
+       and be it further
+
+       RESOLVED, that the Apache Commons PMC be and hereby is
+       responsible for the creation and maintenance of software
+       related to reusable libraries and components, based on software
+       licensed to the Foundation; and be it further
+
+       RESOLVED, that the office of "Vice President, Apache Commons"
+       be and hereby is created, the person holding such office to
+       serve at the direction of the Board of Directors as the chair
+       of the Apache Commons PMC, and to have primary responsibility
+       for management of the projects within the scope of
+       responsibility of the Apache Commons PMC; and be it further
+
+       RESOLVED, that the persons listed immediately below be and
+       hereby are appointed to serve as the initial members of the
+       Apache Commons PMC:
+
+              Aaron Bannert
+              Ken Coar (chair)
+              Peter Donald
+              Justin Erenkrantz
+              Brian W. Fitzpatrick
+              Jim Jagielski
+              Geir Magnusson Jr.
+              Greg Stein
+              Sander Striker
+
+       NOW, THEREFORE, BE IT FURTHER RESOLVED, that Ken Coar be and
+       hereby is appointed to the office of Vice President, Apache
+       Commons, to serve in accordance with and subject to the
+       direction of the Board of Directors and the Bylaws of the
+       Foundation until death, resignation, retirement, removal or
+       disqualification, or until a successor is appointed; and be it
+       further
+
+       RESOLVED, that the initial Apache Commons PMC be and hereby is
+       tasked with the creation of a set of bylaws intended to
+       encourage open development and increased participation in the
+       Apache Commons Project.
+
+#
+# Local Variables:
+# mode: indented-text
+# tab-width: 4
+# indent-tabs-mode: nil
+# tab-stop-list: (4 6 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80)
+# End:
+#
diff --git a/t/t9010/src.diff0 b/t/t9010/src.diff0
new file mode 100644
index 0000000000000000000000000000000000000000..50a49fc1065a313b8592645614e3779261c981fe
GIT binary patch
literal 74
zcmWFz^J8FWobJ&$y_(yQVFD||Mux_A*9puDK#;ExT$)stT2!2wpQlisnUkZCl&Vl#
WoSLGLmS3a*lSt0bD=Es);Q|10>loAk

literal 0
HcmV?d00001

diff --git a/t/t9010/src.done b/t/t9010/src.done
new file mode 100644
index 0000000..56b2572
--- /dev/null
+++ b/t/t9010/src.done
@@ -0,0 +1,522 @@
+APACHE COMMONS PROJECT
+STATUS: -*-indented-text-*- Last modified at [$Date$]
+
+Background:
+    o IRC channel #apache-commons on irc.openprojects.net
+      traffic is logged to <URL:http://Source-Zone.Org/apache-irc/>
+      so that the content of interactive discussions is available
+      to everyone
+
+Project committers (as of 2002-10-27):
+    o commons:
+      aaron,coar,donaldp,jerenkrantz,fitz,geirm,gstein,jim,striker
+    o commons-site:
+      aaron,coar,donaldp,jerenkrantz,fitz,geirm,gstein,jim,striker,
+      sanders,nicolaken
+
+Release:
+    none yet; still defining mission :-)
+
+
+Resolved Issues:
+
+    o Commons is a parent of reusable code projects. These projects
+      may be used by other projects of the ASF, but it is not a
+      requirement.
+
+    o The Commons will be language-agnostic.
+
+    o Projects that are "in scope" are defined as:
+    
+      - Existing components that are, or would be, useful to multiple
+        projects
+
+      - If a component does not fit the (TBD) goals of Apache Commons,
+        then it is not considered "in scope" just because it has no
+        other home. In other words, the Apache Commons is not a place
+        of last refuge if the component does not match the Apache
+        Commons' goals.
+
+      - Reusable libraries
+        [ gstein: we should expand this definition for the mission
+          statement; examples provided were serf and regexp ]
+
+      - Components that do not fit cleanly into any other top-level
+        project, but they do fit the goals of Commons.
+
+    o Voting will follow the "standard Apache voting guidelines"
+
+      [ be nice to refer to an Incubator doc here ]
+
+    o All code donations [to the ASF, destined for Apache Commons]
+      arrive via the Incubator, unless the Incubator states they can
+      be placed directly into Commons.
+
+    o Existing Commons committers can start new components without a
+      detour to the Incubator. These new components must be approved
+      by the PMC and must meet the (TBD) goals of Apache Commons.
+
+
+Pending issues:
+    o Co    o Subversion will be used for version controlComing up with a set of bylaws for the project
+
+    o Enabling Reply-to on the @commons lists
+      (pmc@ will *not* use reply-to munging, but user lists
+      will be determined by user majority; this item applies
+      to lists for which the decision has not yet been made)
+      +1: aaron, coar, donaldp, geirm, acoliver, mas, bayard, sanders
+      -1: fitz, gstein, jerenkrantz, striker, jim
+
+    o The name 'Commons' has caused some heartburn with the
+      Jakarta community because of the Jakarta-Commons project.
+      Should we rename to avoid conflicts and keep the peace?
+      Conflicts would include Java namespace as well as
+      philosophical aspects.
+      +1: 
+      +0: coar (i'm willing)
+      -0: jerenkrantz, donaldp, striker, gstein, fitz
+      -1: sanders
+
+    o If we rename, to what?  What words/names describe our
+      purpose?
+      - toolbox
+        +0: gstein (I'd be +1 but for the confusion with the existing
+                    Apache Toolbox project, but *really* like this
+                    name)
+      - toolchest
+        +0.5: gstein
+      - tools
+        +0: gstein
+        -1: donaldp (tools are different to components)
+      - components
+        +0: gstein (a bit long)
+      - util
+      - library
+        +0: gstein (doesn't fit well with perl/python "modules")
+      - suite (sweet?)
+      - belt (as in bat-belt or tool-belt)
+      - mcgyver
+      - foundry or mill
+        +1: sanders (maybe too 'SourceForgeesque')
+        -0: donaldp (If reorg goes through we may have multiple
+                     foundaries or federations for different "concepts")
+      - federation
+      - share or shared
+      - stuff
+        +.3: fitz :)
+      - ?
+
+    o Style for the mailing lists:
+    
+      One community mailing list, with specific breakouts:
+        +1: fitz, jerenkrantz, sanders, coar,
+            donaldp (lets start here and evolve)
+      +0.5: mas
+        -0: aaron (too early)
+      
+      Topical mailing lists:
+        +1: gstein, scolebourne, acoliver, striker
+        -0: aaron (too early), jerenkrantz
+      -0.1: mas, sanders (too early for this), donaldp
+        -1: coar
+      
+      Per-language mailing lists:
+        -0: aaron (too early)
+      -0.1: mas
+        -1: gstein, sanders, fitz, jerenkrantz, striker, coar
+
+      Per-component mailing lists as a default (breakouts will create
+          these as a matter of course, this is about the default)
+      +0.7: mas
+        -0: aaron (too early)
+      -0.9: sanders
+        -1: gstein, fitz, jerenkrantz, striker
+
+    o A number of very valid issues have been brought up on the
+      list. We need to figure out how the Commons Project will
+      deal with each of these, in terms of new components and
+      how those components will contain code projects. This list
+      is only meant to keep record of all the issues:
+
+        - Releasable pieces
+        - Release rules
+        - Voting scope
+        - Directory structure and naming conventions
+        - Coding style
+        - Build system consistency (or inconsistency)
+        - Namespace issues (esp. w/ java)
+        - Language vs. Functional
+
+    o Default commit privileges
+    
+      - Commons-wide
+        +1:
+        -1: gstein, striker, donaldp
+      
+      - Per-component
+        +1: gstein, striker, donaldp, jerenkrantz
+        -1:
+      
+      - Per-component with self-chosen aggregation
+        +1: gstein, donaldp
+        -1:
+
+    o Granularity of CVS repositories for components (this excludes
+      commons-site)
+    
+      - Commons-wide
+        +1: gstein, donaldp, jerenkrantz
+        -1:
+      
+      - Per-topic
+        +1:
+        -0: gstein, donaldp, jerenkrantz
+        -1: 
+      
+      - Per-component
+        +1:
+        -1: gstein, donaldp, jerenkrantz
+
+
+Project Mission:
+
+What is the project's mission?  Our statement of goals/mission/vision
+should arise from the answers to the following and other questions:
+(jim notes that defining something after the fact seems very backwards
+ and broken; gstein notes that we're refining the board-provided
+ charter)
+
+    o Should commons have an sandbox component to ease infrastructure
+      burden on smaller code bases?
+      +1: coar, donaldp, jerenkrantz, gstein, sanders (non-binding)
+      +0: fitz
+      -0: striker
+      -1: jim (the PMC is about reusability, not sandbox),
+          aaron (what jim said; and go see incubator)
+
+    o What types of components would be appropriate for this project? 
+      ("in scope")
+
+      - Tools that help/promote reusability?
+        Hypothetical: ant, jlibtool, ASF-based autoconf
+        +1: jerenkrantz, gstein, striker, fitz, sanders (non-binding)
+        -0: donaldp (prefer a tools PMC for that)
+        -1: aaron (too broad, don't belong here)
+
+      - Development frameworks?
+        Hypothetical: avalon
+        +1: fitz
+        -0: donaldp (how do we determine this given we would prolly
+                     accept it if it was new?)
+        -1: gstein (the avalon components, but not the whole bugger),
+            striker, sanders (non-binding)
+
+     - Components that fit the (TBD) goals of Commons, have a more
+       "logical" home elsewhere in the ASF, but were rejected by that
+       home?
+        +1: gstein, donaldp
+         0: striker (on a case by case basis, taking reasons for rejection
+                     seriously into account. Abstain from vote until
+                     rephrased),
+            fitz (what striker said), aaron (what fitz said)
+        -1: jerenkrantz, sanders (non-binding)
+
+      FOLD BELOW VOTES INTO ABOVE? (i.e. eliminate the "donation" wording)
+      - Donations that could fit but have a more obvious (proper) home which
+        has already rejected it?
+        +1: coar, donaldp, gstein (note the "might fit" term)
+        -0:
+        -1: jerenkrantz, jim, aaron, striker, fitz
+
+      - Existing ASF components whose committers believe that they
+        are a better fit under commons and the commons PMC agrees?
+        (If this component were brought up as new, we would accept it.)
+        +1: coar, donaldp, jerenkrantz, striker, gstein, fitz
+        -1: jim (by this definition httpd could be in commons)
+                (gstein says: see the "if" part; we wouldn't accept httpd)
+                (jim says: until we better define what the PMC would or
+                 would not accept, then this seems too wishy-washy to me)
+             (gstein says: jim, you're blocking closure on this;
+              how would you refine the phrasing here; the intent
+              here is to accept components from the other Commons
+              projects or projects with reusable component),
+             aaron (we need to differenciate ourselves from other
+                    libraries first, namely APR)
+
+      - Packages being worked on by Apache developers, with a clear
+        affiliation, that can't or won't be bundled?  (E.g., an
+        httpd module)
+        +1: coar, donaldp
+        -1: jerenkrantz, striker, gstein, fitz, jim, aaron
+       CLOSE THIS? (as "not passed"; what is a good way to phrase this?)
+
+      - Should we have a minimum bar of entry for components?
+        +1:
+        -0: donaldp, gstein
+        -1:
+       
+      - Should we have a minimum set of requirements before components
+        are released?
+        +1: donaldp, gstein (mixed, see below), striker
+        -1: jerenkrantz (what is released?)
+
+      - If yes to above then which things should be part of minimum
+        requirements?
+
+        documentation: require basic overview and user docs
+        +1: donaldp
+        -0: gstein (recommend highly, but let the committers determine
+                    what is right for the component),
+            striker, jerenkrantz
+        -1:
+
+        uptodate website: require website be updated to latest release
+                          but may still host previous release docs.
+        +1: donaldp, gstein, striker
+        -0: jerenkrantz
+        -1:
+
+        unit tests: (okay so this will never get consensus but ...)
+        +1: donaldp
+        -1: gstein (unit tests should be recommended, but not
+                    mandated; I also find it unreasonable for initial
+                    development/pre-alpha releases, but it can make
+                    sense for "final" types of releases),
+            striker, jerenkrantz
+
+        versioning standard: derived from
+            http://apr.apache.org/versioning.html
+            http://jakarta.apache.org/commons/versioning.html
+        +1: donaldp, gstein, striker, jerenkrantz
+        -1:
+
+        release process: derived from
+          http://jakarta.apache.org/commons/releases.html
+          http://jakarta.apache.org/turbine/maven/development/release-process.html
+          http://cvs.apache.org/viewcvs.cgi/jakarta-ant/ReleaseInstructions?rev=1.9.2.1&content-type=text/vnd.viewcvs-markup
+        +1: donaldp
+        -1: gstein (we should provide "best practices" but allow each
+                    components' committers to define their rules),
+            striker, jerenkrantz
+
+        deprecation process: (java specific?)
+          http://jakarta.apache.org/turbine/maven/development/deprecation.html
+        +1: donaldp, gstein (I see this as part of the "versioning"
+                             process, and we can provide best
+                             practices here)
+        -0: jerenkrantz (kinda sorta versioning, but not quite)
+        -1:
+
+        CVS/Subversion branching:
+          http://jakarta.apache.org/turbine/maven/development/branches.html
+        +1: donaldp
+        -1: gstein (we should provide "best practices" but allow each
+                    components' committers to define their rules),
+            striker, jerenkrantz
+
+
+Candidate Projects:
+
+    o APR's serf project has voted itself to move into Commons.
+    
+      - Should the PMC accept it as fitting the Commons goal?
+        +1: gstein, fitz, jerenkrantz, striker, donaldp
+        -1: aaron (no such thing as "the Commons goal", how can it fit it?)
+
+      - When should it move?
+
+        Whenever it likes:
+          +1: gstein, sanders, jerenkrantz, striker
+          +0: donaldp (+1 if we use subversion, but if using CVS 
+              we should hold off until structure is decided upon)
+          -1: aaron (after we know why it fits)
+
+        Give us a while:
+          +1: fitz (what's the hurry?), aaron
+          -0: gstein (we're only talking about a small seed of a
+                codebase; it won't get in our way as we complete the
+                charter), striker
+
+      - Where should the CVS code be located?
+      
+        commons/serf   (each component under top-level)
+            +1: sanders (works well at jakarta-commons)
+                fitz (Please don't mix interface and implementation 
+                  of commons!), aaron
+            +0: jerenkrantz
+          -0.5: gstein
+            -1: donaldp (makes it difficult to update all related 
+                         projects with a single sweep)
+
+        commons/components/serf   (all components under this dir,
+            leaving the top open for other non-code items)
+          +0: gstein, striker, donaldp (is this just dev with a 
+                                        different name?
+                                        (gstein says "yes"))
+          -1: fitz, aaron, jerenkrantz
+        
+        commons/clients/serf   (topical-groups under top-level)
+          +1: gstein, jerenkrantz
+          -1: fitz, aaron, donaldp
+        
+        commons/dev/serf  (all components under "dev")
+          +1: gstein, donaldp (if we are having a single 
+                               monolithic repo for all commons)
+          -1: fitz, aaron, jerenkrantz
+        
+        commons/bootstrap/serf  (serf is very early stage, so maybe we
+            have a "bootstrap" area; this is different from Incubator
+            since the existing committers do not need "training")
+          +1: gstein, donaldp
+          -1: fitz, aaron, jerenkrantz
+
+        commons/???
+
+        commons/c/serf (separate out component based on language
+                        and then have a flat structure underneath)
+          +1: donaldp
+          -1: jerenkrantz
+
+      - What mailing list should it use for dev discussions?
+      
+        general@commons.apache.org:  (one group for all discussion;
+                                      dev and non-dev alike)
+          -0.5: gstein
+            -1: striker, aaron, jerenkrantz
+        
+        dev@commons.apache.org:  (one group for dev discussion;
+                                  general@ remains for non-dev)
+          +1: gstein, fitz, sanders, jerenkrantz, striker, donaldp
+          
+        clients-dev@commons.apache.org:
+           (this is really TOPICNAME-dev@ where I preselected
+            "clients" for TOPICNAME; this question is whether this
+            style would be appropriate)
+          +1: gstein, striker
+          -0: sanders, donaldp (maybe in the future but too early),
+              jerenkrantz
+          -1: aaron (what is "clients"? I'd probably be +1 if I knew
+                     what that was)
+
+      - Note: serf has no web site, so there isn't a need to figure
+        that out right now.
+
+
+Assets:
+    DNS:                commons.apache.org
+    
+    Mailing lists:      general@commons.apache.org
+                        announce@commons.apache.org
+                        pmc@commons.apache.org
+                        cvs@commons.apache.org
+                        
+                        [ core-cvs@commons.apache.org in case we
+                          create a commons-core CVS module ]
+
+    Web site:           http://commons.apache.org/
+    
+    Repositories:       commons        (code, info, etc)
+                        commons-site   (the web site)
+
+
+PMC Members:
+
+    Aaron Bannert <aaron@apache.org>
+    Ken Coar <coar@apache.org>
+    Peter Donald <peter@apache.org>
+    Justin Erenkrantz <jerenkrantz@apache.org>
+    Brian W. Fitzpatrick <fitz@apache.org>
+    Jim Jagielski <jim@apache.org>
+    Geir Magnusson Jr. <geirm@apache.org>
+    Greg Stein <gstein@lyra.org>
+    Sander Striker <striker@apache.org>
+
+    Note: Ken Coar is the Chair
+
+
+PMC Members, pending Board approval:
+
+    none yet
+
+    [ this may become obsolete; the Board is discussing a way for the
+      Chair to directly alter the PMC membership; until then, however,
+      we need PMC members ratified by the board, and this tracks them ]
+
+
+Committers:
+
+    none yet [still defining mission]
+
+
+Invited Committers:
+
+    none yet
+
+
+Current mission/charter as approved by the board:
+
+    'The Apache Commons PMC hereby is responsible for the creation
+    and maintenance of software related to reusable libraries and
+    components, based on software licensed to the Foundation.'
+
+The complete text of the resolution that was passed is:
+
+       WHEREAS, the Board of Directors deems it to be in the best
+       interests of the Foundation and consistent with the
+       Foundation's purpose to establish a Project Management
+       Committee charged with the creation and maintenance of
+       open-source software related to reusable libraries and
+       components, for distribution at no charge to the public.
+
+       NOW, THEREFORE, BE IT RESOLVED, that a Project Management
+       Committee (PMC), to be known as the "Apache Commons PMC", be
+       and hereby is established pursuant to Bylaws of the Foundation;
+       and be it further
+
+       RESOLVED, that the Apache Commons PMC be and hereby is
+       responsible for the creation and maintenance of software
+       related to reusable libraries and components, based on software
+       licensed to the Foundation; and be it further
+
+       RESOLVED, that the office of "Vice President, Apache Commons"
+       be and hereby is created, the person holding such office to
+       serve at the direction of the Board of Directors as the chair
+       of the Apache Commons PMC, and to have primary responsibility
+       for management of the projects within the scope of
+       responsibility of the Apache Commons PMC; and be it further
+
+       RESOLVED, that the persons listed immediately below be and
+       hereby are appointed to serve as the initial members of the
+       Apache Commons PMC:
+
+              Aaron Bannert
+              Ken Coar (chair)
+              Peter Donald
+              Justin Erenkrantz
+              Brian W. Fitzpatrick
+              Jim Jagielski
+              Geir Magnusson Jr.
+              Greg Stein
+              Sander Striker
+
+       NOW, THEREFORE, BE IT FURTHER RESOLVED, that Ken Coar be and
+       hereby is appointed to the office of Vice President, Apache
+       Commons, to serve in accordance with and subject to the
+       direction of the Board of Directors and the Bylaws of the
+       Foundation until death, resignation, retirement, removal or
+       disqualification, or until a successor is appointed; and be it
+       further
+
+       RESOLVED, that the initial Apache Commons PMC be and hereby is
+       tasked with the creation of a set of bylaws intended to
+       encourage open development and increased participation in the
+       Apache Commons Project.
+
+#
+# Local Variables:
+# mode: indented-text
+# tab-width: 4
+# indent-tabs-mode: nil
+# tab-stop-list: (4 6 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80)
+# End:
+#
diff --git a/test-svn-fe.c b/test-svn-fe.c
index 77cf78a..658c2a7 100644
--- a/test-svn-fe.c
+++ b/test-svn-fe.c
@@ -4,14 +4,37 @@
 
 #include "git-compat-util.h"
 #include "vcs-svn/svndump.h"
+#include "vcs-svn/svndiff.h"
+#include "vcs-svn/line_buffer.h"
 
 int main(int argc, char *argv[])
 {
-	if (argc != 2)
-		usage("test-svn-fe <file>");
-	svndump_init(argv[1]);
-	svndump_read(NULL);
-	svndump_deinit();
-	svndump_reset();
-	return 0;
+	static const char test_svnfe_usage[] =
+		"test-svn-fe (<dumpfile> | [-d] <preimage> <delta> <len>)";
+	if (argc < 2)
+		usage(test_svnfe_usage);
+	if (argc == 2) {
+		svndump_init(argv[1]);
+		svndump_read(NULL);
+		svndump_deinit();
+		svndump_reset();
+		return 0;
+	}
+	if (argc == 5 && !strcmp(argv[1], "-d")) {
+		struct line_buffer preimage = LINE_BUFFER_INIT;
+		struct line_buffer delta = LINE_BUFFER_INIT;
+		buffer_init(&preimage, argv[2]);
+		buffer_init(&delta, argv[3]);
+		if (svndiff0_apply(&delta, (off_t) strtoull(argv[4], NULL, 0),
+				   &preimage, stdout))
+			return 1;
+		if (buffer_deinit(&preimage))
+			die_errno("cannot close preimage");
+		if (buffer_deinit(&delta))
+			die_errno("cannot close delta");
+		buffer_reset(&preimage);
+		buffer_reset(&delta);
+		return 0;
+	}
+	usage(test_svnfe_usage);
 }
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index df0b1a2..1668bf7 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -4,6 +4,7 @@
  */
 
 #include "git-compat-util.h"
+#include "sliding_window.h"
 #include "line_buffer.h"
 
 /*
@@ -12,15 +13,53 @@
  * See http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff.
  *
  * svndiff0 ::= 'SVN\0' window window*;
+ * window ::= int int int int int instructions inline_data;
+ * instructions ::= instruction*;
+ * instruction ::= view_selector int int
+ *   | copyfrom_data int
+ *   | packed_view_selector int
+ *   | packed_copyfrom_data
+ *   ;
+ * view_selector ::= copyfrom_source
+ *   | copyfrom_target
+ *   ;
+ * copyfrom_source ::= # binary 00 000000;
+ * copyfrom_target ::= # binary 01 000000;
+ * copyfrom_data ::= # binary 10 000000;
+ * packed_view_selector ::= # view_selector OR-ed with 6 bit value;
+ * packed_copyfrom_data ::= # copyfrom_data OR-ed with 6 bit value;
  * int ::= highdigit* lowdigit;
  * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
  * lowdigit ::= # 7 bit value;
  */
 
+#define INSN_MASK	0xc0
+#define INSN_COPYFROM_SOURCE	0000
+#define INSN_COPYFROM_TARGET	0x40
+#define INSN_COPYFROM_DATA	0x80
+#define OPERAND_MASK	0x3f
+
 #define VLI_CONTINUE	0x80
 #define VLI_DIGIT_MASK	0x7f
 #define VLI_BITS_PER_DIGIT 7
 
+struct window {
+	struct {
+		struct view *window;
+		size_t declared_len;
+	} in;
+	struct {
+		off_t off;
+		struct strbuf buf;
+		size_t declared_len;
+	} out;
+	struct strbuf instructions;
+	struct {
+		struct strbuf buf;
+		size_t declared_len;
+	} data;
+};
+
 static int read_magic(struct line_buffer *in, off_t *len)
 {
 	static const char magic[] = {'S', 'V', 'N', '\0'};
@@ -75,3 +114,229 @@ static int parse_int(const char **buf, size_t *result, const char *end)
 	return error("Invalid instruction: incomplete integer %"PRIu64,
 		     (uint64_t) rv);
 }
+
+static int read_offset(struct line_buffer *in, off_t *result, off_t *len)
+{
+	uintmax_t val;
+	if (read_int(in, &val, len))
+		return -1;
+	if (val > maximum_signed_value_of_type(off_t))
+		return error("Unrepresentable offset: %"PRIuMAX, val);
+	*result = val;
+	return 0;
+}
+
+static int read_length(struct line_buffer *in, size_t *result, off_t *len)
+{
+	uintmax_t val;
+	if (read_int(in, &val, len))
+		return -1;
+	if (val > SIZE_MAX)
+		return error("Unrepresentable length: %"PRIuMAX, val);
+	*result = val;
+	return 0;
+}
+
+static int read_chunk(struct line_buffer *delta, off_t *delta_len,
+		      struct strbuf *buf, size_t len)
+{
+	int truncated = 0;
+	strbuf_reset(buf);
+	/* Need to truncate? */
+	if (len > maximum_signed_value_of_type(off_t)) {
+		len = (size_t) maximum_signed_value_of_type(off_t);
+		truncated = 1;
+	}
+	if ((off_t) len > *delta_len) {
+		len = *delta_len;
+		truncated = 1;
+	}
+	buffer_read_binary(buf, len, delta);
+	*delta_len -= buf->len;
+	if (buf->len < len)
+		truncated = 1;
+	return truncated;
+}
+
+
+static int write_strbuf(struct strbuf *sb, FILE *out)
+{
+	if (fwrite(sb->buf, 1, sb->len, out) == sb->len)	/* Success. */
+		return 0;
+	return error("Cannot write: %s\n", strerror(errno));
+}
+
+static int copyfrom_source(struct window *ctx, const char **instructions,
+			   size_t nbytes, const char *insns_end)
+{
+	size_t offset;
+	if (parse_int(instructions, &offset, insns_end))
+		return -1;
+	if (unsigned_add_overflows(offset, nbytes) ||
+	    offset + nbytes > ctx->in.declared_len)
+		return error("Invalid delta: copies unallocated source data.");
+	if (offset + nbytes > ctx->in.window->buf.len)	/* Input exhausted. */
+		nbytes = ctx->in.window->buf.len - offset;
+	strbuf_add(&ctx->out.buf, ctx->in.window->buf.buf + offset, nbytes);
+	return 0;
+}
+
+static int copyfrom_target(struct window *ctx, const char **instructions,
+			   size_t nbytes, const char *insns_end)
+{
+	const size_t out_pos = ctx->out.buf.len;
+	size_t offset;
+	if (parse_int(instructions, &offset, insns_end))
+		return -1;
+	if (offset > out_pos)
+		return error("Invalid delta: copies from the future.");
+	if (unsigned_add_overflows(offset, nbytes) ||
+	    offset + nbytes > ctx->out.declared_len)
+		return error("Invalid delta: copies unallocated target data.");
+	while (nbytes) {
+		strbuf_addch(&ctx->out.buf, ctx->out.buf.buf[offset++]);
+		nbytes--;
+	}
+	return 0;
+}
+
+static int copyfrom_data(struct window *ctx, size_t *data_pos, size_t nbytes)
+{
+	const size_t pos = *data_pos;
+	if (unsigned_add_overflows(pos, nbytes) ||
+	    pos + nbytes > ctx->data.declared_len)
+		return error("Invalid delta: copies unallocated inline data.");
+	if (pos + nbytes > ctx->data.buf.len)	/* Data exhausted. */
+		nbytes = ctx->data.buf.len - pos;
+	strbuf_add(&ctx->out.buf, ctx->data.buf.buf + pos, nbytes);
+	*data_pos += nbytes;
+	return 0;
+}
+
+static int parse_first_operand(const char **buf, size_t *out, const char *end)
+{
+	size_t result = (unsigned char) *(*buf)++ & OPERAND_MASK;
+	if (result) {
+		*out = result;
+		return 0;
+	}
+	return parse_int(buf, out, end);
+}
+
+static int step(struct window *ctx, const char **instructions, size_t *data_pos)
+{
+	unsigned char instruction;
+	const char *insns_end = ctx->instructions.buf + ctx->instructions.len;
+	const size_t out_pos = ctx->out.buf.len;
+	size_t nbytes;
+	assert(ctx);
+	assert(instructions && *instructions);
+
+	instruction = (unsigned char) **instructions;
+	if (parse_first_operand(instructions, &nbytes, insns_end))
+		return -1;
+	if (unsigned_add_overflows(out_pos, nbytes) ||
+	    out_pos + nbytes > ctx->out.declared_len)
+		return error("Invalid delta: output overflows buffer.");
+
+	switch (instruction & INSN_MASK) {
+	case INSN_COPYFROM_SOURCE:
+		return copyfrom_source(ctx, instructions, nbytes, insns_end);
+	case INSN_COPYFROM_TARGET:
+		return copyfrom_target(ctx, instructions, nbytes, insns_end);
+	case INSN_COPYFROM_DATA:
+		return copyfrom_data(ctx, data_pos, nbytes);
+	default:
+		return error("Invalid instruction %x",
+			     (unsigned int) instruction);
+	}
+}
+
+static int apply_window_in_core(struct window *ctx)
+{
+	const char *insn = ctx->instructions.buf;
+	size_t data_pos = 0;
+
+	/*
+	 * Advance p while copying data from the source, target,
+	 * and inline data views.
+	 */
+	while (insn && insn != ctx->instructions.buf + ctx->instructions.len)
+		if (step(ctx, &insn, &data_pos))
+			return -1;
+	return 0;
+}
+
+static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
+			    struct view *preimage, off_t preimage_len,
+			    off_t *out_offset, FILE *out)
+{
+	struct window ctx = {
+		{preimage, preimage_len},	/* preimage */
+		{*out_offset, STRBUF_INIT, 0},	/* postimage */
+		STRBUF_INIT,			/* instructions */
+		{STRBUF_INIT, 0}		/* inline data */
+	};
+	size_t instructions_len;
+	int rv = 0;
+	assert(out_offset);
+	assert(delta_len);
+
+	/* "source view" offset and length already handled; */
+	if (read_length(delta, &ctx.out.declared_len, delta_len) ||
+	    read_length(delta, &instructions_len, delta_len) ||
+	    read_length(delta, &ctx.data.declared_len, delta_len))
+		return -1;
+	if (read_chunk(delta, delta_len, &ctx.instructions, instructions_len))
+		warning("Invalid delta: "
+			"incomplete instructions (%"PRIu64"/%"PRIu64")",
+			(uint64_t) ctx.instructions.len,
+			(uint64_t) instructions_len);
+	if (read_chunk(delta, delta_len, &ctx.data.buf, ctx.data.declared_len))
+		; /* inline data is truncated.  okay. */
+	if (buffer_ferror(delta)) {
+		rv = error("Cannot read delta: %s", strerror(errno));
+		goto done;
+	}
+	if (apply_window_in_core(&ctx) || write_strbuf(&ctx.out.buf, out)) {
+		rv = -1;
+		goto done;
+	}
+ done:
+	strbuf_release(&ctx.out.buf);
+	strbuf_release(&ctx.instructions);
+	strbuf_release(&ctx.data.buf);
+	return rv;
+}
+
+int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
+		   struct line_buffer *preimage, FILE *postimage)
+{
+	struct view preimage_view = {preimage, 0, STRBUF_INIT};
+	off_t out_offset = 0;
+	assert(delta && preimage && postimage);
+
+	if (read_magic(delta, &delta_len))
+		goto fail;
+	while (delta_len > 0) {	/* For each window: */
+		off_t pre_off = pre_off;
+		size_t pre_len;
+		if (read_offset(delta, &pre_off, &delta_len) ||
+		    read_length(delta, &pre_len, &delta_len) ||
+		    move_window(&preimage_view, pre_off, pre_len) ||
+		    apply_one_window(delta, &delta_len,
+				     &preimage_view, pre_len,
+				     &out_offset, postimage))
+			goto fail;
+		if (delta_len && buffer_at_eof(delta)) {
+			error("Delta ends early! (%"PRIu64" bytes remaining)",
+			      (uint64_t) delta_len);
+			goto fail;
+		}
+	}
+	strbuf_release(&preimage_view.buf);
+	return 0;
+ fail:
+	strbuf_release(&preimage_view.buf);
+	return -1;
+}
diff --git a/vcs-svn/svndiff.h b/vcs-svn/svndiff.h
new file mode 100644
index 0000000..a986099
--- /dev/null
+++ b/vcs-svn/svndiff.h
@@ -0,0 +1,9 @@
+#ifndef SVNDIFF_H_
+#define SVNDIFF_H_
+
+#include "line_buffer.h"
+
+extern int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
+			  struct line_buffer *preimage, FILE *postimage);
+
+#endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH/RFC 0/11] Building up the delta parser
  2010-10-11  4:01         ` [PATCH/RFC 16'/16] vcs-svn: Add svn delta parser Jonathan Nieder
@ 2010-10-13  9:17           ` Jonathan Nieder
  2010-10-13  9:19             ` [PATCH 01/11] fixup! vcs-svn: Learn to parse variable-length integers Jonathan Nieder
                               ` (12 more replies)
  0 siblings, 13 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:17 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Jonathan Nieder wrote:

> Implement an svndiff 0 interpreter

I hear that was nigh unreadable, so here's a reroll.  Less
cargo-cult support for broken deltas, more readability and tests.
Patches apply on top of "[PATCH 15/16] t9010 (svn-fe): Eliminate
dependency on svn perl bindings".  As before, the end result
includes a 'test-svn-fe -d' command that can apply svndiff0-format
deltas, meaning less binary garbage to worry about as you puzzle
over that confusing "svnrdump dump" output in debugging sessions.

Questions?  Improvements?  Bugs?

Patch 1 is a fixup to the variable-length integer parsing code, to
report unexpected EOF (i.e., declared content length too long)
correctly when it occurs in the middle of such an integer.

Patch 2 is the svndiff0 interpreter in broad strokes: read window,
read window, read window, ....  The patch doesn't encode any
knowledge about what actually goes _in_ a window aside from the
header, so it will error out for nonempty windows.

Patch 3 teaches the nacent interpreter to keep the appropriate
piece of the preimage in memory.  This is probably earlier in the
series than it ought to be, but I wanted to try out the sliding
window code.

With patches 4 and 5, the interpreter learns to read the "data"
and "instructions" section of a window.  The effect is observable
because it finds the beginning of the next window correctly.

Patch 6 is an example instruction (copyfrom_data).

Patches 7-8 introduce some sanity checks.

Patches 9 and 10 are another instruction (copyfrom_target) and
another sanity check.

Patch 11 is the last instruction (copyfrom_source).  That's it.
You can apply deltas now!

If anything seems unclear, please don't spend time puzzling it
out --- just yell at me, so the code or documentation can be
cleaned up.  Happy reading.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 01/11] fixup! vcs-svn: Learn to parse variable-length integers
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
@ 2010-10-13  9:19             ` Jonathan Nieder
  2010-10-13  9:21             ` [PATCH 02/11] vcs-svn: Skeleton of an svn delta parser Jonathan Nieder
                               ` (11 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:19 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Report EOF correctly in integer parsing code.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
This patch is meant for squashing.

 vcs-svn/svndiff.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index df0b1a2..36d2b30 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -45,7 +45,8 @@ static int read_int(struct line_buffer *in, uintmax_t *result, off_t *len)
 	while (sz) {
 		int ch = buffer_read_char(in);
 		if (ch == EOF)
-			break;
+			return error("Delta ends early (%"PRIu64" bytes remaining)",
+				     (uint64_t) sz);
 		sz--;
 		rv <<= VLI_BITS_PER_DIGIT;
 		rv += (ch & VLI_DIGIT_MASK);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 02/11] vcs-svn: Skeleton of an svn delta parser
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
  2010-10-13  9:19             ` [PATCH 01/11] fixup! vcs-svn: Learn to parse variable-length integers Jonathan Nieder
@ 2010-10-13  9:21             ` Jonathan Nieder
  2010-10-13  9:30             ` [PATCH 03/11] vcs-svn: Read the preimage while applying deltas Jonathan Nieder
                               ` (10 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:21 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

A delta in the subversion delta (svndiff0) format consists of the
magic bytes SVN\0 followed by a sequence of windows, each beginning
with a window header consisting of five integers (with variable-length
representation):

	source view offset
	source view length
	output length
	instructions length
	auxiliary data length

Add an svndiff0_apply() function and test-svn-fe -d commandline tool
to parse such a delta in the special case of not including any
instructions or auxiliary data.

Later patches will add features to turn this into a fully functional
delta applier, for use by svn-fe in parsing the streams produced by
"svnrdump dump" and "svnadmin dump --deltas".

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Makefile          |    4 +-
 t/t9011-svn-da.sh |   82 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 test-svn-fe.c     |   39 ++++++++++++++++++++----
 vcs-svn/svndiff.c |   64 +++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.h |    9 ++++++
 5 files changed, 189 insertions(+), 9 deletions(-)
 create mode 100755 t/t9011-svn-da.sh
 create mode 100644 vcs-svn/svndiff.h

diff --git a/Makefile b/Makefile
index d99da33..966f5c7 100644
--- a/Makefile
+++ b/Makefile
@@ -1766,7 +1766,7 @@ XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/string_pool.o vcs-svn/line_buffer.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/svndump.o \
-	vcs-svn/sliding_window.o
+	vcs-svn/sliding_window.o vcs-svn/svndiff.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
 
 dep_files := $(foreach f,$(OBJECTS),$(dir $f).depend/$(notdir $f).d)
@@ -1893,7 +1893,7 @@ xdiff-interface.o $(XDIFF_OBJS): \
 $(VCSSVN_OBJS): \
 	vcs-svn/obj_pool.h vcs-svn/trp.h vcs-svn/string_pool.h \
 	vcs-svn/line_buffer.h vcs-svn/repo_tree.h vcs-svn/fast_export.h \
-	vcs-svn/svndump.h vcs-svn/sliding_window.h
+	vcs-svn/sliding_window.h vcs-svn/svndump.h vcs-svn/svndiff.h
 endif
 
 exec_cmd.s exec_cmd.o: EXTRA_CPPFLAGS = \
diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
new file mode 100755
index 0000000..8dccd16
--- /dev/null
+++ b/t/t9011-svn-da.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+test_description='test handling of deltas by dumpfile importer'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>empty &&
+	printf foo >preimage
+'
+
+test_expect_success 'reject empty delta' '
+	test_must_fail test-svn-fe -d preimage empty 0
+'
+
+test_expect_success 'delta can empty file' '
+	printf "SVNQ" | q_to_nul >clear.delta &&
+	test-svn-fe -d preimage clear.delta 4 >actual &&
+	test_cmp empty actual
+'
+
+test_expect_success 'one-window empty delta' '
+	printf "SVNQ%s" "QQQQQ" | q_to_nul >clear.onewindow &&
+	test-svn-fe -d preimage clear.onewindow 9 >actual &&
+	test_cmp empty actual
+'
+
+test_expect_success 'incomplete window header' '
+	printf "SVNQ%s" "QQQQQ" | q_to_nul >clear.onewindow &&
+	printf "SVNQ%s" "QQ" | q_to_nul >clear.partialwindow &&
+	test_must_fail test-svn-fe -d preimage clear.onewindow 6 &&
+	test_must_fail test-svn-fe -d preimage clear.partialwindow 6
+'
+
+test_expect_success 'declared delta longer than actual delta' '
+	printf "SVNQ%s" "QQQQQ" | q_to_nul >clear.onewindow &&
+	printf "SVNQ%s" "QQ" | q_to_nul >clear.partialwindow &&
+	test_must_fail test-svn-fe -d preimage clear.onewindow 14 &&
+	test_must_fail test-svn-fe -d preimage clear.partialwindow 9
+'
+
+test_expect_success 'two-window empty delta' '
+	printf "SVNQ%s%s" "QQQQQ" "QQQQQ" | q_to_nul >clear.twowindow &&
+	test-svn-fe -d preimage clear.twowindow 14 >actual &&
+	test_must_fail test-svn-fe -d preimage clear.twowindow 13 &&
+	test_cmp empty actual
+'
+
+test_expect_success 'noisy zeroes' '
+	printf "SVNQ%s" \
+		"RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRQQQQQ" |
+		tr R "\200" |
+		q_to_nul >clear.noisy &&
+	len=$(wc -c <clear.noisy) &&
+	test-svn-fe -d preimage clear.noisy $len &&
+	test_cmp empty actual
+'
+
+test_expect_success 'reject variable-length int in magic' '
+	printf "SVNRQ" | tr R "\200" | q_to_nul >clear.badmagic &&
+	test_must_fail test-svn-fe -d preimage clear.badmagic 5
+'
+
+test_expect_success 'truncated integer' '
+	printf "SVNQ%s%s" "QQQQQ" "QQQQRRQ" |
+		tr R "\200" |
+		q_to_nul >clear.fullint &&
+	printf "SVNQ%s%s" "QQQQQ" "QQQQRR" |
+		tr RT "\201" |
+		q_to_nul >clear.partialint &&
+	test_must_fail test-svn-fe -d preimage clear.fullint 15 &&
+	test-svn-fe -d preimage clear.fullint 16 &&
+	test_must_fail test-svn-fe -d preimage clear.partialint 15
+'
+
+test_expect_success 'nonempty (but unused) preimage view' '
+	printf "SVNQ%b" "Q\003QQQ" | q_to_nul >clear.readpreimage &&
+	test-svn-fe -d preimage clear.readpreimage 9 >actual &&
+	test_cmp empty actual
+'
+
+test_done
diff --git a/test-svn-fe.c b/test-svn-fe.c
index 77cf78a..197a2c3 100644
--- a/test-svn-fe.c
+++ b/test-svn-fe.c
@@ -4,14 +4,39 @@
 
 #include "git-compat-util.h"
 #include "vcs-svn/svndump.h"
+#include "vcs-svn/svndiff.h"
+#include "vcs-svn/line_buffer.h"
 
 int main(int argc, char *argv[])
 {
-	if (argc != 2)
-		usage("test-svn-fe <file>");
-	svndump_init(argv[1]);
-	svndump_read(NULL);
-	svndump_deinit();
-	svndump_reset();
-	return 0;
+	static const char test_svnfe_usage[] =
+		"test-svn-fe (<dumpfile> | [-d] <preimage> <delta> <len>)";
+	if (argc < 2)
+		usage(test_svnfe_usage);
+	if (argc == 2) {
+		svndump_init(argv[1]);
+		svndump_read(NULL);
+		svndump_deinit();
+		svndump_reset();
+		return 0;
+	}
+	if (argc == 5 && !strcmp(argv[1], "-d")) {
+		struct line_buffer preimage = LINE_BUFFER_INIT;
+		struct line_buffer delta = LINE_BUFFER_INIT;
+		if (buffer_init(&preimage, argv[2]))
+			die_errno("cannot open preimage");
+		if (buffer_init(&delta, argv[3]))
+			die_errno("cannot open delta");
+		if (svndiff0_apply(&delta, (off_t) strtoull(argv[4], NULL, 0),
+				   &preimage, stdout))
+			return 1;
+		if (buffer_deinit(&preimage))
+			die_errno("cannot close preimage");
+		if (buffer_deinit(&delta))
+			die_errno("cannot close delta");
+		buffer_reset(&preimage);
+		buffer_reset(&delta);
+		return 0;
+	}
+	usage(test_svnfe_usage);
 }
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 36d2b30..e572a93 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -12,6 +12,7 @@
  * See http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff.
  *
  * svndiff0 ::= 'SVN\0' window window*;
+ * window ::= int int int int int instructions inline_data;
  * int ::= highdigit* lowdigit;
  * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
  * lowdigit ::= # 7 bit value;
@@ -76,3 +77,66 @@ static int parse_int(const char **buf, size_t *result, const char *end)
 	return error("Invalid instruction: incomplete integer %"PRIu64,
 		     (uint64_t) rv);
 }
+
+static int read_offset(struct line_buffer *in, off_t *result, off_t *len)
+{
+	uintmax_t val;
+	if (read_int(in, &val, len))
+		return -1;
+	if (val > maximum_signed_value_of_type(off_t))
+		return error("Unrepresentable offset: %"PRIuMAX, val);
+	*result = val;
+	return 0;
+}
+
+static int read_length(struct line_buffer *in, size_t *result, off_t *len)
+{
+	uintmax_t val;
+	if (read_int(in, &val, len))
+		return -1;
+	if (val > SIZE_MAX)
+		return error("Unrepresentable length: %"PRIuMAX, val);
+	*result = val;
+	return 0;
+}
+
+static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
+{
+	size_t out_len;
+	size_t instructions_len;
+	size_t data_len;
+	assert(delta_len);
+
+	/* "source view" offset and length already handled; */
+	if (read_length(delta, &out_len, delta_len) ||
+	    read_length(delta, &instructions_len, delta_len) ||
+	    read_length(delta, &data_len, delta_len))
+		return -1;
+	if (instructions_len > 0)
+		return error("What do you think I am?  A delta applier?");
+	if (data_len > 0)
+		return error("No support for inline data yet");
+	return 0;
+}
+
+int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
+		   struct line_buffer *preimage, FILE *postimage)
+{
+	assert(delta && preimage && postimage);
+
+	if (read_magic(delta, &delta_len))
+		return -1;
+	while (delta_len > 0) {	/* For each window: */
+		off_t pre_off;
+		size_t pre_len;
+		if (read_offset(delta, &pre_off, &delta_len) ||
+		    read_length(delta, &pre_len, &delta_len) ||
+		    apply_one_window(delta, &delta_len))
+			return -1;
+		if (delta_len && buffer_at_eof(delta))
+			return error("Delta ends early! "
+				     "(%"PRIu64" bytes remaining)",
+				     (uint64_t) delta_len);
+	}
+	return 0;
+}
diff --git a/vcs-svn/svndiff.h b/vcs-svn/svndiff.h
new file mode 100644
index 0000000..a986099
--- /dev/null
+++ b/vcs-svn/svndiff.h
@@ -0,0 +1,9 @@
+#ifndef SVNDIFF_H_
+#define SVNDIFF_H_
+
+#include "line_buffer.h"
+
+extern int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
+			  struct line_buffer *preimage, FILE *postimage);
+
+#endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 03/11] vcs-svn: Read the preimage while applying deltas
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
  2010-10-13  9:19             ` [PATCH 01/11] fixup! vcs-svn: Learn to parse variable-length integers Jonathan Nieder
  2010-10-13  9:21             ` [PATCH 02/11] vcs-svn: Skeleton of an svn delta parser Jonathan Nieder
@ 2010-10-13  9:30             ` Jonathan Nieder
  2010-10-14 21:45               ` Sam Vilain
  2010-10-13  9:35             ` [PATCH 04/11] vcs-svn: Read inline data from deltas Jonathan Nieder
                               ` (9 subsequent siblings)
  12 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:30 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

The source view offset heading each svndiff0 window represents a
number of bytes past the beginning of the preimage.  Together with the
source view length, it instructs the delta applier about what portion
of the preimage instructions will refer to.  Read in that data right
away using the sliding window code.

Maybe some day we will mmap() to prepare to read data more lazily.

For compatibility with Subversion's implementation, tolerate source
view offsets pointing past the end of the preimage file (a later
patch will remove this flexibility).  For simplicity, also permit
source views that start within the preimage and end outside of it,
even though Subversion does not.

This does not teach the delta applier to read instructions or copy
data from the source view yet.  Deltas that would produce nonempty
output are still rejected.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
It occurs to me that Sam Vilain may well have something valuable to
say about this series, having implemented something similar[1].

Sam, this series adds an svndiff0 parser for git to use in parsing
v3 dumps (which are way easier to produce with remote access to an
svn repository than v2 dumps).  The beginning of the series is at [2],
though that cover letter is out of date: now, modulo any new bugs
I've introduced with this reroll, it is known to successfully apply
all the deltas involved in a complete dump of the ASF repo.

I am interested in improvements and complaints of all kinds.

[1] http://search.cpan.org/~samv/Parse-SVNDiff-0.03/lib/Parse/SVNDiff.pm
[2] http://thread.gmane.org/gmane.comp.version-control.git/151086/focus=158731

 t/t9011-svn-da.sh |   38 ++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   22 +++++++++++++++-------
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 8dccd16..b9aad70 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -79,4 +79,42 @@ test_expect_success 'nonempty (but unused) preimage view' '
 	test_cmp empty actual
 '
 
+test_expect_success 'preimage view: right endpoint cannot backtrack' '
+	printf "SVNQ%b%b" "Q\003QQQ" "Q\002QQQ" |
+		q_to_nul >clear.backtrack &&
+	test_must_fail test-svn-fe -d preimage clear.backtrack 14
+'
+
+test_expect_success 'preimage view: left endpoint can advance' '
+	printf "SVNQ%b%b" "Q\003QQQ" "\001\002QQQ" |
+		q_to_nul >clear.preshrink &&
+	printf "SVNQ%b%b" "Q\003QQQ" "\001\001QQQ" |
+		q_to_nul >clear.shrinkbacktrack &&
+	test-svn-fe -d preimage clear.preshrink 14 >actual &&
+	test_must_fail test-svn-fe -d preimage clear.shrinkbacktrack 14 &&
+	test_cmp empty actual
+'
+
+test_expect_success 'preimage view: offsets compared by value' '
+	printf "SVNQ%b%b" "\001\001QQQ" "\0200Q\003QQQ" |
+		q_to_nul >clear.noisybacktrack &&
+	printf "SVNQ%b%b" "\001\001QQQ" "\0200\001\002QQQ" |
+		q_to_nul >clear.noisyadvance &&
+	test_must_fail test-svn-fe -d preimage clear.noisybacktrack 15
+	test-svn-fe -d preimage clear.noisyadvance 15 &&
+	test_cmp empty actual
+'
+
+test_expect_success 'preimage view: accept truncated preimage' '
+	printf "SVNQ%b" "\010QQQQ" | q_to_nul >clear.lateemptyread &&
+	printf "SVNQ%b" "\010\001QQQ" | q_to_nul >clear.latenonemptyread &&
+	printf "SVNQ%b" "\001\010QQQ" | q_to_nul >clear.longread &&
+	test-svn-fe -d preimage clear.lateemptyread 9 >actual.emptyread &&
+	test-svn-fe -d preimage clear.latenonemptyread 9 >actual.nonemptyread &&
+	test-svn-fe -d preimage clear.longread 9 >actual.longread &&
+	test_cmp empty actual.emptyread &&
+	test_cmp empty actual.nonemptyread &&
+	test_cmp empty actual.longread
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index e572a93..f2876b3 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -4,6 +4,7 @@
  */
 
 #include "git-compat-util.h"
+#include "sliding_window.h"
 #include "line_buffer.h"
 
 /*
@@ -122,21 +123,28 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
 		   struct line_buffer *preimage, FILE *postimage)
 {
+	struct view preimage_view = {preimage, 0, STRBUF_INIT};
 	assert(delta && preimage && postimage);
 
 	if (read_magic(delta, &delta_len))
-		return -1;
+		goto fail;
 	while (delta_len > 0) {	/* For each window: */
-		off_t pre_off;
+		off_t pre_off = pre_off;
 		size_t pre_len;
 		if (read_offset(delta, &pre_off, &delta_len) ||
 		    read_length(delta, &pre_len, &delta_len) ||
+		    move_window(&preimage_view, pre_off, pre_len) ||
 		    apply_one_window(delta, &delta_len))
-			return -1;
-		if (delta_len && buffer_at_eof(delta))
-			return error("Delta ends early! "
-				     "(%"PRIu64" bytes remaining)",
-				     (uint64_t) delta_len);
+			goto fail;
+		if (delta_len && buffer_at_eof(delta)) {
+			error("Delta ends early! (%"PRIu64" bytes remaining)",
+			      (uint64_t) delta_len);
+			goto fail;
+		}
 	}
+	strbuf_release(&preimage_view.buf);
 	return 0;
+ fail:
+	strbuf_release(&preimage_view.buf);
+	return -1;
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 04/11] vcs-svn: Read inline data from deltas
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (2 preceding siblings ...)
  2010-10-13  9:30             ` [PATCH 03/11] vcs-svn: Read the preimage while applying deltas Jonathan Nieder
@ 2010-10-13  9:35             ` Jonathan Nieder
  2010-10-13  9:38             ` [PATCH 05/11] vcs-svn: Read instructions " Jonathan Nieder
                               ` (8 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:35 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

Each window of an svndiff0-format delta includes a section for new
data that will be copied into the preimage (in the order it appears in
the window, possibly interspersed with other data).

Read this data when encountering it.  It is not actually necessary to
do so --- it would be just as easy to copy straight from the delta
to output when interpreting the relevant instructions --- but this
way, the code that interprets svndiff0 instructions can proceed more
quickly because it does not require any I/O.

Subversion's implementation rejects deltas that do not consume all
the auxiliary data that is available.  Do not check that for now,
because it would make it impossible to test the function of this
patch until the instructions to consume data are implemented.

Do check for truncated data sections.  Since Subversion's applier
rejects deltas that end before the new-data section is declared to
end, it should be safe for this applier to reject such deltas, too.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   12 ++++++++++++
 vcs-svn/svndiff.c |   27 ++++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index b9aad70..44832b0 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -117,4 +117,16 @@ test_expect_success 'preimage view: accept truncated preimage' '
 	test_cmp empty actual.longread
 '
 
+test_expect_success 'inline data' '
+	printf "SVNQ%b%s%b%s" "QQQQ\003" "bar" "QQQQ\001" "x" |
+		q_to_nul >inline.clear &&
+	test-svn-fe -d preimage inline.clear 18 >actual &&
+	test_cmp empty actual
+'
+
+test_expect_success 'truncated inline data' '
+	printf "SVNQ%b%s" "QQQQ\003" "b" | q_to_nul >inline.trunc &&
+	test_must_fail test-svn-fe -d preimage inline.trunc 10
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index f2876b3..c60d732 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -23,6 +23,10 @@
 #define VLI_DIGIT_MASK	0x7f
 #define VLI_BITS_PER_DIGIT 7
 
+struct window {
+	struct strbuf data;
+};
+
 static int read_magic(struct line_buffer *in, off_t *len)
 {
 	static const char magic[] = {'S', 'V', 'N', '\0'};
@@ -101,11 +105,25 @@ static int read_length(struct line_buffer *in, size_t *result, off_t *len)
 	return 0;
 }
 
+static int read_chunk(struct line_buffer *delta, off_t *delta_len,
+		      struct strbuf *buf, size_t len)
+{
+	if (len > maximum_signed_value_of_type(off_t) ||
+	    (off_t) len > *delta_len)
+		return -1;
+	strbuf_reset(buf);
+	buffer_read_binary(buf, len, delta);
+	*delta_len -= buf->len;
+	return 0;
+}
+
 static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 {
+	struct window ctx = {STRBUF_INIT};
 	size_t out_len;
 	size_t instructions_len;
 	size_t data_len;
+	int rv = 0;
 	assert(delta_len);
 
 	/* "source view" offset and length already handled; */
@@ -115,9 +133,12 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 		return -1;
 	if (instructions_len > 0)
 		return error("What do you think I am?  A delta applier?");
-	if (data_len > 0)
-		return error("No support for inline data yet");
-	return 0;
+	if (read_chunk(delta, delta_len, &ctx.data, data_len))
+		return error("Invalid delta: incomplete data section");
+	if (buffer_ferror(delta))
+		rv = error("Cannot read delta: %s", strerror(errno));
+	strbuf_release(&ctx.data);
+	return rv;
 }
 
 int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 05/11] vcs-svn: Read instructions from deltas
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (3 preceding siblings ...)
  2010-10-13  9:35             ` [PATCH 04/11] vcs-svn: Read inline data from deltas Jonathan Nieder
@ 2010-10-13  9:38             ` Jonathan Nieder
  2010-10-13  9:39             ` [PATCH 06/11] vcs-svn: Implement copyfrom_data delta instruction Jonathan Nieder
                               ` (7 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:38 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

Buffer the instruction section upon encountering it for later
interpretation.

An alternative design would involve parsing the instructions
at this point and buffering them in some processed form.  Using
the unprocessed form is simpler.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |    5 +++++
 vcs-svn/svndiff.c |   23 ++++++++++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 44832b0..1383263 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -129,4 +129,9 @@ test_expect_success 'truncated inline data' '
 	test_must_fail test-svn-fe -d preimage inline.trunc 10
 '
 
+test_expect_success 'truncated inline data (after instruction section)' '
+	printf "SVNQ%b%b%s" "QQ\001\001\003" "\0201" "b" | q_to_nul >insn.trunc &&
+	test_must_fail test-svn-fe -d preimage insn.trunc 11
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index c60d732..72fe716 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -24,6 +24,7 @@
 #define VLI_BITS_PER_DIGIT 7
 
 struct window {
+	struct strbuf instructions;
 	struct strbuf data;
 };
 
@@ -119,7 +120,7 @@ static int read_chunk(struct line_buffer *delta, off_t *delta_len,
 
 static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 {
-	struct window ctx = {STRBUF_INIT};
+	struct window ctx = {STRBUF_INIT, STRBUF_INIT};
 	size_t out_len;
 	size_t instructions_len;
 	size_t data_len;
@@ -131,13 +132,25 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 	    read_length(delta, &instructions_len, delta_len) ||
 	    read_length(delta, &data_len, delta_len))
 		return -1;
+	if (read_chunk(delta, delta_len, &ctx.instructions, instructions_len))
+		return error("Invalid delta: incomplete instructions section");
+	if (buffer_ferror(delta)) {
+		rv = error("Cannot read delta: %s", strerror(errno));
+		goto done;
+	}
+	if (read_chunk(delta, delta_len, &ctx.data, data_len)) {
+		rv = error("Invalid delta: incomplete data section");
+		goto done;
+	}
+	if (buffer_ferror(delta)) {
+		rv = error("Cannot read delta: %s", strerror(errno));
+		goto done;
+	}
 	if (instructions_len > 0)
 		return error("What do you think I am?  A delta applier?");
-	if (read_chunk(delta, delta_len, &ctx.data, data_len))
-		return error("Invalid delta: incomplete data section");
-	if (buffer_ferror(delta))
-		rv = error("Cannot read delta: %s", strerror(errno));
+ done:
 	strbuf_release(&ctx.data);
+	strbuf_release(&ctx.instructions);
 	return rv;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 06/11] vcs-svn: Implement copyfrom_data delta instruction
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (4 preceding siblings ...)
  2010-10-13  9:38             ` [PATCH 05/11] vcs-svn: Read instructions " Jonathan Nieder
@ 2010-10-13  9:39             ` Jonathan Nieder
  2010-10-13  9:41             ` [PATCH 07/11] vcs-svn: Check declared number of output bytes Jonathan Nieder
                               ` (6 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:39 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

The copyfrom_data instruction copies a few bytes verbatim from the
auxiliary data section of a window to the postimage.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   31 +++++++++++++++++++
 vcs-svn/svndiff.c |   86 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 1383263..9279924 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -134,4 +134,35 @@ test_expect_success 'truncated inline data (after instruction section)' '
 	test_must_fail test-svn-fe -d preimage insn.trunc 11
 '
 
+test_expect_success 'copyfrom_data' '
+	echo hi >expect &&
+	printf "SVNQ%b%b%b" "QQ\003\001\003" "\0203" "hi\n" | q_to_nul >copydat &&
+	test-svn-fe -d preimage copydat 13 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'multiple copyfrom_data' '
+	echo hi >expect &&
+	printf "SVNQ%b%b%b%b%b" "QQ\003\002\003" "\0201\0202" "hi\n" \
+		"QQQ\002Q" "\0200Q" | q_to_nul >copy.multi &&
+	len=$(wc -c <copy.multi) &&
+	test-svn-fe -d preimage copy.multi $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'incomplete multiple insn' '
+	printf "SVNQ%b%b%b" "QQ\003\002\003" "\0203\0200" "hi\n" |
+		q_to_nul >copy.partial &&
+	len=$(wc -c <copy.partial) &&
+	test_must_fail test-svn-fe -d preimage copy.partial $len
+'
+
+test_expect_success 'catch attempt to copy missing data' '
+	printf "SVNQ%b%b%s%b%s" "QQ\002\002\001" "\0201\0201" "X" \
+			"QQQQ\002" "YZ" |
+		q_to_nul >copy.incomplete &&
+	len=$(wc -c <copy.incomplete) &&
+	test_must_fail test-svn-fe -d preimage copy.incomplete $len
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 72fe716..ac776e0 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -14,20 +14,40 @@
  *
  * svndiff0 ::= 'SVN\0' window window*;
  * window ::= int int int int int instructions inline_data;
+ * instructions ::= instruction*;
+ * instruction ::= view_selector int int
+ *   | copyfrom_data int
+ *   | packed_view_selector int
+ *   | packed_copyfrom_data
+ *   ;
+ * copyfrom_data ::= # binary 10 000000;
+ * packed_copyfrom_data ::= # copyfrom_data OR-ed with 6 bit value;
  * int ::= highdigit* lowdigit;
  * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
  * lowdigit ::= # 7 bit value;
  */
 
+#define INSN_MASK	0xc0
+#define INSN_COPYFROM_DATA	0x80
+#define OPERAND_MASK	0x3f
+
 #define VLI_CONTINUE	0x80
 #define VLI_DIGIT_MASK	0x7f
 #define VLI_BITS_PER_DIGIT 7
 
 struct window {
+	struct strbuf out;
 	struct strbuf instructions;
 	struct strbuf data;
 };
 
+static int write_strbuf(struct strbuf *sb, FILE *out)
+{
+	if (fwrite(sb->buf, 1, sb->len, out) == sb->len)	/* Success. */
+		return 0;
+	return error("Cannot write: %s\n", strerror(errno));
+}
+
 static int read_magic(struct line_buffer *in, off_t *len)
 {
 	static const char magic[] = {'S', 'V', 'N', '\0'};
@@ -118,9 +138,63 @@ static int read_chunk(struct line_buffer *delta, off_t *delta_len,
 	return 0;
 }
 
-static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
+static int copyfrom_data(struct window *ctx, size_t *data_pos, size_t nbytes)
 {
-	struct window ctx = {STRBUF_INIT, STRBUF_INIT};
+	const size_t pos = *data_pos;
+	if (unsigned_add_overflows(pos, nbytes) ||
+	    pos + nbytes > ctx->data.len)
+		return error("Invalid delta: copies unavailable inline data.");
+	strbuf_add(&ctx->out, ctx->data.buf + pos, nbytes);
+	*data_pos += nbytes;
+	return 0;
+}
+
+static int parse_first_operand(const char **buf, size_t *out, const char *end)
+{
+	size_t result = (unsigned char) *(*buf)++ & OPERAND_MASK;
+	if (result) {
+		*out = result;
+		return 0;
+	}
+	return parse_int(buf, out, end);
+}
+
+static int step(struct window *ctx, const char **instructions, size_t *data_pos)
+{
+	unsigned int instruction;
+	const char *insns_end = ctx->instructions.buf + ctx->instructions.len;
+	size_t nbytes;
+	assert(ctx);
+	assert(instructions && *instructions);
+	assert(data_pos);
+
+	instruction = (unsigned char) **instructions;
+	if (parse_first_operand(instructions, &nbytes, insns_end))
+		return -1;
+	if ((instruction & INSN_MASK) != INSN_COPYFROM_DATA)
+		return error("Unknown instruction %x", instruction);
+	return copyfrom_data(ctx, data_pos, nbytes);
+}
+
+static int apply_window_in_core(struct window *ctx)
+{
+	const char *insn = ctx->instructions.buf;
+	size_t data_pos = 0;
+
+	/*
+	 * Populate ctx->out.buf using data from the source, target,
+	 * and inline data views.
+	 */
+	while (insn != ctx->instructions.buf + ctx->instructions.len)
+		if (step(ctx, &insn, &data_pos))
+			return -1;
+	return 0;
+}
+
+static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
+			    FILE *out)
+{
+	struct window ctx = {STRBUF_INIT, STRBUF_INIT, STRBUF_INIT};
 	size_t out_len;
 	size_t instructions_len;
 	size_t data_len;
@@ -146,8 +220,10 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len)
 		rv = error("Cannot read delta: %s", strerror(errno));
 		goto done;
 	}
-	if (instructions_len > 0)
-		return error("What do you think I am?  A delta applier?");
+	if (apply_window_in_core(&ctx) || write_strbuf(&ctx.out, out)) {
+		rv = -1;
+		goto done;
+	}
  done:
 	strbuf_release(&ctx.data);
 	strbuf_release(&ctx.instructions);
@@ -168,7 +244,7 @@ int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
 		if (read_offset(delta, &pre_off, &delta_len) ||
 		    read_length(delta, &pre_len, &delta_len) ||
 		    move_window(&preimage_view, pre_off, pre_len) ||
-		    apply_one_window(delta, &delta_len))
+		    apply_one_window(delta, &delta_len, postimage))
 			goto fail;
 		if (delta_len && buffer_at_eof(delta)) {
 			error("Delta ends early! (%"PRIu64" bytes remaining)",
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 07/11] vcs-svn: Check declared number of output bytes
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (5 preceding siblings ...)
  2010-10-13  9:39             ` [PATCH 06/11] vcs-svn: Implement copyfrom_data delta instruction Jonathan Nieder
@ 2010-10-13  9:41             ` Jonathan Nieder
  2010-10-13  9:48             ` [PATCH 08/11] vcs-svn: Reject deltas that do not consume all inline data Jonathan Nieder
                               ` (5 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:41 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

Check that the declared output size for each window is correct, and
reserve that amount of space in the output buffer in advance.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/svndiff.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index ac776e0..c03cd7e 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -220,10 +220,15 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
 		rv = error("Cannot read delta: %s", strerror(errno));
 		goto done;
 	}
+	strbuf_grow(&ctx.out, out_len);
 	if (apply_window_in_core(&ctx) || write_strbuf(&ctx.out, out)) {
 		rv = -1;
 		goto done;
 	}
+	if (ctx.out.len != out_len) {
+		rv = error("Invalid delta: incorrect postimage length");
+		goto done;
+	}
  done:
 	strbuf_release(&ctx.data);
 	strbuf_release(&ctx.instructions);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 08/11] vcs-svn: Reject deltas that do not consume all inline data
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (6 preceding siblings ...)
  2010-10-13  9:41             ` [PATCH 07/11] vcs-svn: Check declared number of output bytes Jonathan Nieder
@ 2010-10-13  9:48             ` Jonathan Nieder
  2010-10-13  9:50             ` [PATCH 09/11] vcs-svn: Let deltas use data from postimage Jonathan Nieder
                               ` (4 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:48 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

The main point is to constrain the format of deltas more,
so corruption and other breakage can be more easily detected.

Requiring deltas not to provide unconsumed data also opens
the possibility of ignoring the declared amount of new data
and simply streaming the data as needed to fulfill
copyfrom_data requests.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   12 +++---------
 vcs-svn/svndiff.c |    2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index 9279924..c9f4768 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -117,20 +117,14 @@ test_expect_success 'preimage view: accept truncated preimage' '
 	test_cmp empty actual.longread
 '
 
-test_expect_success 'inline data' '
+test_expect_success 'unconsumed inline data' '
 	printf "SVNQ%b%s%b%s" "QQQQ\003" "bar" "QQQQ\001" "x" |
 		q_to_nul >inline.clear &&
-	test-svn-fe -d preimage inline.clear 18 >actual &&
-	test_cmp empty actual
+	test_must_fail test-svn-fe -d preimage inline.clear 18 >actual
 '
 
 test_expect_success 'truncated inline data' '
-	printf "SVNQ%b%s" "QQQQ\003" "b" | q_to_nul >inline.trunc &&
-	test_must_fail test-svn-fe -d preimage inline.trunc 10
-'
-
-test_expect_success 'truncated inline data (after instruction section)' '
-	printf "SVNQ%b%b%s" "QQ\001\001\003" "\0201" "b" | q_to_nul >insn.trunc &&
+	printf "SVNQ%b%b%s" "QQ\003\001\003" "\0203" "b" | q_to_nul >insn.trunc &&
 	test_must_fail test-svn-fe -d preimage insn.trunc 11
 '
 
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index c03cd7e..8755c83 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -188,6 +188,8 @@ static int apply_window_in_core(struct window *ctx)
 	while (insn != ctx->instructions.buf + ctx->instructions.len)
 		if (step(ctx, &insn, &data_pos))
 			return -1;
+	if (data_pos != ctx->data.len)
+		return error("Invalid delta: does not copy all new data");
 	return 0;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 09/11] vcs-svn: Let deltas use data from postimage
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (7 preceding siblings ...)
  2010-10-13  9:48             ` [PATCH 08/11] vcs-svn: Reject deltas that do not consume all inline data Jonathan Nieder
@ 2010-10-13  9:50             ` Jonathan Nieder
  2010-10-13  9:53             ` [PATCH 10/11] vcs-svn: Reject deltas that read past end of preimage Jonathan Nieder
                               ` (3 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output.  (The offset
argument is relative to the beginning of output produced by the
current window.)

The region copied is allowed to run past the end of the existing
output.  To support that case, copy one character at a time
rather than using memcpy() or memmove().  This allows copyfrom_target
to be used once to repeat a string many times.  For example:

	COPYFROM_DATA 2
	COPYFROM_OUTPUT 10, 0
	DATA "ab"

would produce the output "ababababababababababab".

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh |   42 ++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   30 ++++++++++++++++++++++++++++--
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index c9f4768..ccd31e9 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -159,4 +159,46 @@ test_expect_success 'catch attempt to copy missing data' '
 	test_must_fail test-svn-fe -d preimage copy.incomplete $len
 '
 
+test_expect_success 'copyfrom target to repeat data' '
+	printf foofoo >expect &&
+	printf "SVNQ%b%b%s" "QQ\006\004\003" "\0203\0100\003Q" "foo" |
+		q_to_nul >copytarget.repeat &&
+	len=$(wc -c <copytarget.repeat) &&
+	test-svn-fe -d preimage copytarget.repeat $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'copyfrom target out of order' '
+	printf foooof >expect &&
+	printf "SVNQ%b%b%s" \
+		"QQ\006\007\003" "\0203\0101\002\0101\001\0101Q" "foo" |
+		q_to_nul >copytarget.reverse &&
+	len=$(wc -c <copytarget.reverse) &&
+	test-svn-fe -d preimage copytarget.reverse $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'catch copyfrom future' '
+	printf "SVNQ%b%b%s" "QQ\004\004\003" "\0202\0101\002\0201" "XYZ" |
+		q_to_nul >copytarget.infuture &&
+	len=$(wc -c <copytarget.infuture) &&
+	test_must_fail test-svn-fe -d preimage copytarget.infuture $len
+'
+
+test_expect_success 'copy to sustain' '
+	printf XYXYXYXYXYXZ >expect &&
+	printf "SVNQ%b%b%s" "QQ\014\004\003" "\0202\0111Q\0201" "XYZ" |
+		q_to_nul >copytarget.sustain &&
+	len=$(wc -c <copytarget.sustain) &&
+	test-svn-fe -d preimage copytarget.sustain $len >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'catch copy that overflows' '
+	printf "SVNQ%b%b%s" "QQ\003\003\001" "\0201\0177Q" X |
+		q_to_nul >copytarget.overflow &&
+	len=$(wc -c <copytarget.overflow) &&
+	test_must_fail test-svn-fe -d preimage copytarget.overflow $len
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 8755c83..8f1b61e 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -20,7 +20,12 @@
  *   | packed_view_selector int
  *   | packed_copyfrom_data
  *   ;
+ * view_selector ::= copyfrom_source
+ *   | copyfrom_target
+ *   ;
+ * copyfrom_target ::= # binary 01 000000;
  * copyfrom_data ::= # binary 10 000000;
+ * packed_view_selector ::= # view_selector OR-ed with 6 bit value;
  * packed_copyfrom_data ::= # copyfrom_data OR-ed with 6 bit value;
  * int ::= highdigit* lowdigit;
  * highdigit ::= # binary 1000 0000 OR-ed with 7 bit value;
@@ -28,6 +33,7 @@
  */
 
 #define INSN_MASK	0xc0
+#define INSN_COPYFROM_TARGET	0x40
 #define INSN_COPYFROM_DATA	0x80
 #define OPERAND_MASK	0x3f
 
@@ -138,6 +144,21 @@ static int read_chunk(struct line_buffer *delta, off_t *delta_len,
 	return 0;
 }
 
+static int copyfrom_target(struct window *ctx, const char **instructions,
+			   size_t nbytes, const char *insns_end)
+{
+	size_t offset;
+	if (parse_int(instructions, &offset, insns_end))
+		return -1;
+	if (offset >= ctx->out.len)
+		return error("Invalid delta: copies from the future.");
+	while (nbytes) {
+		strbuf_addch(&ctx->out, ctx->out.buf[offset++]);
+		nbytes--;
+	}
+	return 0;
+}
+
 static int copyfrom_data(struct window *ctx, size_t *data_pos, size_t nbytes)
 {
 	const size_t pos = *data_pos;
@@ -171,9 +192,14 @@ static int step(struct window *ctx, const char **instructions, size_t *data_pos)
 	instruction = (unsigned char) **instructions;
 	if (parse_first_operand(instructions, &nbytes, insns_end))
 		return -1;
-	if ((instruction & INSN_MASK) != INSN_COPYFROM_DATA)
+	switch (instruction & INSN_MASK) {
+	case INSN_COPYFROM_TARGET:
+		return copyfrom_target(ctx, instructions, nbytes, insns_end);
+	case INSN_COPYFROM_DATA:
+		return copyfrom_data(ctx, data_pos, nbytes);
+	default:
 		return error("Unknown instruction %x", instruction);
-	return copyfrom_data(ctx, data_pos, nbytes);
+	}
 }
 
 static int apply_window_in_core(struct window *ctx)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 10/11] vcs-svn: Reject deltas that read past end of preimage
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (8 preceding siblings ...)
  2010-10-13  9:50             ` [PATCH 09/11] vcs-svn: Let deltas use data from postimage Jonathan Nieder
@ 2010-10-13  9:53             ` Jonathan Nieder
  2010-10-13  9:58             ` [PATCH 11/11] vcs-svn: Allow deltas to copy from preimage Jonathan Nieder
                               ` (2 subsequent siblings)
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:53 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

Some particularly strange deltas of unknown origin were found to
request copies beyond the end of the preimage.  But svn 1.6 never
produces anything like that.

Although Subversion accepts these perverse deltas as input, let's
error out if some future version of subversion starts to actually
produce them.

Without this change, the diff applier would have to separately
keep track of the number of bytes supposedly and actually written out.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9011-svn-da.sh        |   11 ++++-------
 vcs-svn/sliding_window.c |   10 ++++++----
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index ccd31e9..c4bd1f3 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -105,16 +105,13 @@ test_expect_success 'preimage view: offsets compared by value' '
 	test_cmp empty actual
 '
 
-test_expect_success 'preimage view: accept truncated preimage' '
+test_expect_success 'preimage view: reject truncated preimage' '
 	printf "SVNQ%b" "\010QQQQ" | q_to_nul >clear.lateemptyread &&
 	printf "SVNQ%b" "\010\001QQQ" | q_to_nul >clear.latenonemptyread &&
 	printf "SVNQ%b" "\001\010QQQ" | q_to_nul >clear.longread &&
-	test-svn-fe -d preimage clear.lateemptyread 9 >actual.emptyread &&
-	test-svn-fe -d preimage clear.latenonemptyread 9 >actual.nonemptyread &&
-	test-svn-fe -d preimage clear.longread 9 >actual.longread &&
-	test_cmp empty actual.emptyread &&
-	test_cmp empty actual.nonemptyread &&
-	test_cmp empty actual.longread
+	test_must_fail test-svn-fe -d preimage clear.lateemptyread 9 &&
+	test_must_fail test-svn-fe -d preimage clear.latenonemptyread 9 &&
+	test_must_fail test-svn-fe -d preimage clear.longread 9
 '
 
 test_expect_success 'unconsumed inline data' '
diff --git a/vcs-svn/sliding_window.c b/vcs-svn/sliding_window.c
index 8273970..5c08828 100644
--- a/vcs-svn/sliding_window.c
+++ b/vcs-svn/sliding_window.c
@@ -49,17 +49,19 @@ int move_window(struct view *view, off_t off, size_t len)
 		const off_t gap = off - file_offset;
 		const off_t nread = buffer_skip_bytes(view->file, gap);
 		if (nread != gap) {
-			if (!buffer_ferror(view->file))	/* View ends early. */
-				goto done;
+			if (!buffer_ferror(view->file))
+				return error("Preimage ends early");
 			return error("Cannot seek forward in input: %s",
 				     strerror(errno));
 		}
 		file_offset += gap;
 	}
 	buffer_read_binary(&view->buf, len - view->buf.len, view->file);
-	if (buffer_ferror(view->file))
+	if (view->buf.len != len) {
+		if (!buffer_ferror(view->file))
+			return error("Preimage ends early");
 		return error("Cannot read preimage: %s", strerror(errno));
- done:
+	}
 	view->off = off;
 	return 0;
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 11/11] vcs-svn: Allow deltas to copy from preimage
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (9 preceding siblings ...)
  2010-10-13  9:53             ` [PATCH 10/11] vcs-svn: Reject deltas that read past end of preimage Jonathan Nieder
@ 2010-10-13  9:58             ` Jonathan Nieder
  2010-10-13 10:00             ` Jonathan Nieder
  2010-10-18 17:00             ` [PATCH/RFC 0/11] Building up the delta parser Ramkumar Ramachandra
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13  9:58 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

The copyfrom_source instruction appends data from the preimage
buffer to the end of output.  Its arguments are a length and an
offset relative to the beginning of the source view.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.  Thanks for reading.  Hopefully this
round did not introduce too many bugs but if it did, I'd be glad to
hear about them.

Good night,
Jonathan

 t/t9011-svn-da.sh |   35 +++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   27 +++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index c4bd1f3..c8959e2 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -198,4 +198,39 @@ test_expect_success 'catch copy that overflows' '
 	test_must_fail test-svn-fe -d preimage copytarget.overflow $len
 '
 
+test_expect_success 'copyfrom source' '
+	printf foo >expect &&
+	printf "SVNQ%b%b" "Q\003\003\002Q" "\003Q" | q_to_nul >copysource.all &&
+	test-svn-fe -d preimage copysource.all 11 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'copy backwards' '
+	printf oof >expect &&
+	printf "SVNQ%b%b" "Q\003\003\006Q" "\001\002\001\001\001Q" |
+		q_to_nul >copysource.rev &&
+	test-svn-fe -d preimage copysource.rev 15 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'offsets are relative to window' '
+	printf fo >expect &&
+	printf "SVNQ%b%b%b%b" "Q\003\001\002Q" "\001Q" \
+		"\002\001\001\002Q" "\001Q" |
+		q_to_nul >copysource.two &&
+	test-svn-fe -d preimage copysource.two 18 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'example from notes/svndiff' '
+	printf aaaaccccdddddddd >expect &&
+	printf aaaabbbbcccc >source &&
+	printf "SVNQ%b%b%s" "Q\014\020\007\001" \
+		"\004Q\004\010\0201\0107\010" d |
+		q_to_nul >delta.example &&
+	len=$(wc -c <delta.example) &&
+	test-svn-fe -d source delta.example $len >actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 8f1b61e..d3d1dba 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -23,6 +23,7 @@
  * view_selector ::= copyfrom_source
  *   | copyfrom_target
  *   ;
+ * copyfrom_source ::= # binary 00 000000;
  * copyfrom_target ::= # binary 01 000000;
  * copyfrom_data ::= # binary 10 000000;
  * packed_view_selector ::= # view_selector OR-ed with 6 bit value;
@@ -33,6 +34,7 @@
  */
 
 #define INSN_MASK	0xc0
+#define INSN_COPYFROM_SOURCE	0x00
 #define INSN_COPYFROM_TARGET	0x40
 #define INSN_COPYFROM_DATA	0x80
 #define OPERAND_MASK	0x3f
@@ -42,6 +44,7 @@
 #define VLI_BITS_PER_DIGIT 7
 
 struct window {
+	struct view *in;
 	struct strbuf out;
 	struct strbuf instructions;
 	struct strbuf data;
@@ -144,6 +147,19 @@ static int read_chunk(struct line_buffer *delta, off_t *delta_len,
 	return 0;
 }
 
+static int copyfrom_source(struct window *ctx, const char **instructions,
+			   size_t nbytes, const char *insns_end)
+{
+	size_t offset;
+	if (parse_int(instructions, &offset, insns_end))
+		return -1;
+	if (unsigned_add_overflows(offset, nbytes) ||
+	    offset + nbytes > ctx->in->buf.len)
+		return error("Invalid delta: copies source data outside view.");
+	strbuf_add(&ctx->out, ctx->in->buf.buf + offset, nbytes);
+	return 0;
+}
+
 static int copyfrom_target(struct window *ctx, const char **instructions,
 			   size_t nbytes, const char *insns_end)
 {
@@ -193,12 +209,14 @@ static int step(struct window *ctx, const char **instructions, size_t *data_pos)
 	if (parse_first_operand(instructions, &nbytes, insns_end))
 		return -1;
 	switch (instruction & INSN_MASK) {
+	case INSN_COPYFROM_SOURCE:
+		return copyfrom_source(ctx, instructions, nbytes, insns_end);
 	case INSN_COPYFROM_TARGET:
 		return copyfrom_target(ctx, instructions, nbytes, insns_end);
 	case INSN_COPYFROM_DATA:
 		return copyfrom_data(ctx, data_pos, nbytes);
 	default:
-		return error("Unknown instruction %x", instruction);
+		return error("Invalid instruction %x", instruction);
 	}
 }
 
@@ -220,9 +238,9 @@ static int apply_window_in_core(struct window *ctx)
 }
 
 static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
-			    FILE *out)
+			    struct view *preimage, FILE *out)
 {
-	struct window ctx = {STRBUF_INIT, STRBUF_INIT, STRBUF_INIT};
+	struct window ctx = {preimage, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT};
 	size_t out_len;
 	size_t instructions_len;
 	size_t data_len;
@@ -277,7 +295,8 @@ int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
 		if (read_offset(delta, &pre_off, &delta_len) ||
 		    read_length(delta, &pre_len, &delta_len) ||
 		    move_window(&preimage_view, pre_off, pre_len) ||
-		    apply_one_window(delta, &delta_len, postimage))
+		    apply_one_window(delta, &delta_len,
+				     &preimage_view, postimage))
 			goto fail;
 		if (delta_len && buffer_at_eof(delta)) {
 			error("Delta ends early! (%"PRIu64" bytes remaining)",
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 11/11] vcs-svn: Allow deltas to copy from preimage
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (10 preceding siblings ...)
  2010-10-13  9:58             ` [PATCH 11/11] vcs-svn: Allow deltas to copy from preimage Jonathan Nieder
@ 2010-10-13 10:00             ` Jonathan Nieder
  2010-10-18 17:00             ` [PATCH/RFC 0/11] Building up the delta parser Ramkumar Ramachandra
  12 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-13 10:00 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier, Sam Vilain

The copyfrom_source instruction appends data from the preimage
buffer to the end of output.  Its arguments are a length and an
offset relative to the beginning of the source view.

Helped-by: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
[resending with cc: samv]

That's the end of the series.  Thanks for reading.  Hopefully this
round did not introduce too many bugs but if it did, I'd be glad to
hear about them.

Good night,
Jonathan

 t/t9011-svn-da.sh |   35 +++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.c |   27 +++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/t/t9011-svn-da.sh b/t/t9011-svn-da.sh
index c4bd1f3..c8959e2 100755
--- a/t/t9011-svn-da.sh
+++ b/t/t9011-svn-da.sh
@@ -198,4 +198,39 @@ test_expect_success 'catch copy that overflows' '
 	test_must_fail test-svn-fe -d preimage copytarget.overflow $len
 '
 
+test_expect_success 'copyfrom source' '
+	printf foo >expect &&
+	printf "SVNQ%b%b" "Q\003\003\002Q" "\003Q" | q_to_nul >copysource.all &&
+	test-svn-fe -d preimage copysource.all 11 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'copy backwards' '
+	printf oof >expect &&
+	printf "SVNQ%b%b" "Q\003\003\006Q" "\001\002\001\001\001Q" |
+		q_to_nul >copysource.rev &&
+	test-svn-fe -d preimage copysource.rev 15 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'offsets are relative to window' '
+	printf fo >expect &&
+	printf "SVNQ%b%b%b%b" "Q\003\001\002Q" "\001Q" \
+		"\002\001\001\002Q" "\001Q" |
+		q_to_nul >copysource.two &&
+	test-svn-fe -d preimage copysource.two 18 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'example from notes/svndiff' '
+	printf aaaaccccdddddddd >expect &&
+	printf aaaabbbbcccc >source &&
+	printf "SVNQ%b%b%s" "Q\014\020\007\001" \
+		"\004Q\004\010\0201\0107\010" d |
+		q_to_nul >delta.example &&
+	len=$(wc -c <delta.example) &&
+	test-svn-fe -d source delta.example $len >actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 8f1b61e..d3d1dba 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -23,6 +23,7 @@
  * view_selector ::= copyfrom_source
  *   | copyfrom_target
  *   ;
+ * copyfrom_source ::= # binary 00 000000;
  * copyfrom_target ::= # binary 01 000000;
  * copyfrom_data ::= # binary 10 000000;
  * packed_view_selector ::= # view_selector OR-ed with 6 bit value;
@@ -33,6 +34,7 @@
  */
 
 #define INSN_MASK	0xc0
+#define INSN_COPYFROM_SOURCE	0x00
 #define INSN_COPYFROM_TARGET	0x40
 #define INSN_COPYFROM_DATA	0x80
 #define OPERAND_MASK	0x3f
@@ -42,6 +44,7 @@
 #define VLI_BITS_PER_DIGIT 7
 
 struct window {
+	struct view *in;
 	struct strbuf out;
 	struct strbuf instructions;
 	struct strbuf data;
@@ -144,6 +147,19 @@ static int read_chunk(struct line_buffer *delta, off_t *delta_len,
 	return 0;
 }
 
+static int copyfrom_source(struct window *ctx, const char **instructions,
+			   size_t nbytes, const char *insns_end)
+{
+	size_t offset;
+	if (parse_int(instructions, &offset, insns_end))
+		return -1;
+	if (unsigned_add_overflows(offset, nbytes) ||
+	    offset + nbytes > ctx->in->buf.len)
+		return error("Invalid delta: copies source data outside view.");
+	strbuf_add(&ctx->out, ctx->in->buf.buf + offset, nbytes);
+	return 0;
+}
+
 static int copyfrom_target(struct window *ctx, const char **instructions,
 			   size_t nbytes, const char *insns_end)
 {
@@ -193,12 +209,14 @@ static int step(struct window *ctx, const char **instructions, size_t *data_pos)
 	if (parse_first_operand(instructions, &nbytes, insns_end))
 		return -1;
 	switch (instruction & INSN_MASK) {
+	case INSN_COPYFROM_SOURCE:
+		return copyfrom_source(ctx, instructions, nbytes, insns_end);
 	case INSN_COPYFROM_TARGET:
 		return copyfrom_target(ctx, instructions, nbytes, insns_end);
 	case INSN_COPYFROM_DATA:
 		return copyfrom_data(ctx, data_pos, nbytes);
 	default:
-		return error("Unknown instruction %x", instruction);
+		return error("Invalid instruction %x", instruction);
 	}
 }
 
@@ -220,9 +238,9 @@ static int apply_window_in_core(struct window *ctx)
 }
 
 static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
-			    FILE *out)
+			    struct view *preimage, FILE *out)
 {
-	struct window ctx = {STRBUF_INIT, STRBUF_INIT, STRBUF_INIT};
+	struct window ctx = {preimage, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT};
 	size_t out_len;
 	size_t instructions_len;
 	size_t data_len;
@@ -277,7 +295,8 @@ int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
 		if (read_offset(delta, &pre_off, &delta_len) ||
 		    read_length(delta, &pre_len, &delta_len) ||
 		    move_window(&preimage_view, pre_off, pre_len) ||
-		    apply_one_window(delta, &delta_len, postimage))
+		    apply_one_window(delta, &delta_len,
+				     &preimage_view, postimage))
 			goto fail;
 		if (delta_len && buffer_at_eof(delta)) {
 			error("Delta ends early! (%"PRIu64" bytes remaining)",
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 03/11] vcs-svn: Read the preimage while applying deltas
  2010-10-13  9:30             ` [PATCH 03/11] vcs-svn: Read the preimage while applying deltas Jonathan Nieder
@ 2010-10-14 21:45               ` Sam Vilain
  2010-10-14 23:40                 ` Jonathan Nieder
  0 siblings, 1 reply; 79+ messages in thread
From: Sam Vilain @ 2010-10-14 21:45 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, Git Mailing List, David Michael Barr,
	Sverre Rabbelier

On Wed, 2010-10-13 at 04:30 -0500, Jonathan Nieder wrote:
> It occurs to me that Sam Vilain may well have something valuable to
> say about this series, having implemented something similar[1].
> 
> Sam, this series adds an svndiff0 parser for git to use in parsing
> v3 dumps (which are way easier to produce with remote access to an
> svn repository than v2 dumps).  The beginning of the series is at [2],
> though that cover letter is out of date: now, modulo any new bugs
> I've introduced with this reroll, it is known to successfully apply
> all the deltas involved in a complete dump of the ASF repo.

> [1] http://search.cpan.org/~samv/Parse-SVNDiff-0.03/lib/Parse/SVNDiff.pm

All I did was make the module lazy - version 0.02 was by 唐鳳, any
detailed knowledge I had of the binary format fell out of my head pretty
quickly :-)

Sam

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 03/11] vcs-svn: Read the preimage while applying deltas
  2010-10-14 21:45               ` Sam Vilain
@ 2010-10-14 23:40                 ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-14 23:40 UTC (permalink / raw)
  To: 唐鳳
  Cc: Sam Vilain, Ramkumar Ramachandra, Git Mailing List,
	David Michael Barr, Sverre Rabbelier

Sam Vilain wrote:

> All I did was make the module lazy - version 0.02 was by 唐鳳, any
> detailed knowledge I had of the binary format fell out of my head pretty
> quickly :-)

Ah, my bad.  唐鳳, as part of an attempt to natively support fetching
from and pushing to svn repositories, some contributors to the git
project are working on an svndiff0 applier.  If you're interested in
reliving old memories, please feel free to look it over (especially
the test cases).  Thoughts, simplifications, bug reports, improvements
welcome.

http://thread.gmane.org/gmane.comp.version-control.git/151086/focus=158913

Anyone wanting to try it can check out

	git://repo.or.cz/git/jrn.git svn-da

and use the test-svn-fe command:

	make test-svn-fe
	./test-svn-fe -d <preimage> <delta> <delta length>

or the tests:

	make
	cd t && sh t9011-svn-da.sh -v -i

The preimage or delta argument can be /dev/stdin for use in a pipeline.

Thanks for your work on svk and pugs!
Jonathan

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH/RFC 0/11] Building up the delta parser
  2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
                               ` (11 preceding siblings ...)
  2010-10-13 10:00             ` Jonathan Nieder
@ 2010-10-18 17:00             ` Ramkumar Ramachandra
  2010-10-18 17:03               ` Jonathan Nieder
  12 siblings, 1 reply; 79+ messages in thread
From: Ramkumar Ramachandra @ 2010-10-18 17:00 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Hi Jonathan,

Jonathan Nieder writes:
> Jonathan Nieder wrote:
> 
> > Implement an svndiff 0 interpreter
> 
> I hear that was nigh unreadable, so here's a reroll.  Less
> cargo-cult support for broken deltas, more readability and tests.
> Patches apply on top of "[PATCH 15/16] t9010 (svn-fe): Eliminate
> dependency on svn perl bindings".  As before, the end result
> includes a 'test-svn-fe -d' command that can apply svndiff0-format
> deltas, meaning less binary garbage to worry about as you puzzle
> over that confusing "svnrdump dump" output in debugging sessions.

Thanks for this! The code looks really good and the test suite is
quite comprehensive :)

Eager to see this go through to `master`.

-- Ram

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH/RFC 0/11] Building up the delta parser
  2010-10-18 17:00             ` [PATCH/RFC 0/11] Building up the delta parser Ramkumar Ramachandra
@ 2010-10-18 17:03               ` Jonathan Nieder
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Nieder @ 2010-10-18 17:03 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Sverre Rabbelier

Ramkumar Ramachandra wrote:
> Jonathan Nieder writes:
>> Jonathan Nieder wrote:

>>> Implement an svndiff 0 interpreter
[...]
> Eager to see this go through to `master`.

I think the strbuf use breaks the contrib/svn-fe/svn-fe build
somewhere.  Probably something like the following is needed.

(moving the -I flags off of the compilation command line is
justa "while at it" thing, for consistency with the toplevel
Makefile)

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
diff --git a/contrib/svn-fe/Makefile b/contrib/svn-fe/Makefile
index 360d8da..9e8f174 100644
--- a/contrib/svn-fe/Makefile
+++ b/contrib/svn-fe/Makefile
@@ -6,7 +6,7 @@ MV = mv
 
 CFLAGS = -g -O2 -Wall
 LDFLAGS =
-ALL_CFLAGS = $(CFLAGS)
+ALL_CFLAGS = $(CFLAGS) -I../../vcs-svn -I../..
 ALL_LDFLAGS = $(LDFLAGS)
 EXTLIBS =
 
@@ -38,7 +38,7 @@ svn-fe$X: svn-fe.o $(VCSSVN_LIB) $(GIT_LIB)
 		$(ALL_LDFLAGS) $(LIBS)
 
 svn-fe.o: svn-fe.c ../../vcs-svn/svndump.h
-	$(QUIET_CC)$(CC) -I../../vcs-svn -o $*.o -c $(ALL_CFLAGS) $<
+	$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) $<
 
 svn-fe.html: svn-fe.txt
 	$(QUIET_SUBDIR0)../../Documentation $(QUIET_SUBDIR1) \

^ permalink raw reply related	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2010-10-18 17:07 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
2010-07-15 16:22 ` [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp Ramkumar Ramachandra
2010-07-15 17:25   ` Jonathan Nieder
2010-07-15 22:54     ` Junio C Hamano
2010-07-15 16:22 ` [PATCH 2/8] Introduce vcs-svn lib Ramkumar Ramachandra
2010-07-15 17:46   ` Jonathan Nieder
2010-07-15 19:15     ` Ramkumar Ramachandra
2010-07-15 16:22 ` [PATCH 3/8] Add memory pool library Ramkumar Ramachandra
2010-07-15 18:57   ` Jonathan Nieder
2010-07-15 19:12     ` Ramkumar Ramachandra
2010-07-15 16:23 ` [PATCH 4/8] Add treap implementation Ramkumar Ramachandra
2010-07-15 19:09   ` Jonathan Nieder
2010-07-15 19:18     ` Ramkumar Ramachandra
2010-07-15 16:23 ` [PATCH 5/8] Add string-specific memory pool Ramkumar Ramachandra
2010-07-15 16:23 ` [PATCH 6/8] Add stream helper library Ramkumar Ramachandra
2010-07-15 19:19   ` Jonathan Nieder
2010-07-15 16:23 ` [PATCH 7/8] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
2010-07-15 19:28   ` Jonathan Nieder
2010-07-15 16:23 ` [PATCH 8/8] Add SVN dump parser Ramkumar Ramachandra
2010-07-15 19:52   ` Jonathan Nieder
2010-07-15 20:04     ` Jonathan Nieder
2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
2010-07-16 10:16   ` [PATCH 3/9] Add memory pool library Jonathan Nieder
2010-07-16 10:23   ` [PATCH 4/9] Add treap implementation Jonathan Nieder
2010-07-16 18:26     ` Jonathan Nieder
2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
2010-08-09 22:01     ` [PATCH 01/10] Export parse_date_basic() to convert a date string to timestamp Jonathan Nieder
2010-08-09 22:04     ` [PATCH 02/10] Introduce vcs-svn lib Jonathan Nieder
2010-08-09 22:11     ` [PATCH 03/10] Add memory pool library Jonathan Nieder
2010-08-09 22:17     ` [PATCH 04/10] Add treap implementation Jonathan Nieder
2010-08-12 17:22       ` Junio C Hamano
2010-08-12 22:02         ` Jonathan Nieder
2010-08-12 22:11         ` Jonathan Nieder
2010-08-12 22:44           ` Junio C Hamano
2010-08-09 22:34     ` [PATCH 05/10] Add string-specific memory pool Jonathan Nieder
2010-08-12 17:22       ` Junio C Hamano
2010-08-12 21:30         ` Jonathan Nieder
2010-08-09 22:39     ` [PATCH 06/10] Add stream helper library Jonathan Nieder
2010-08-09 22:48     ` [PATCH 07/10] Infrastructure to write revisions in fast-export format Jonathan Nieder
2010-08-09 22:55     ` [PATCH 08/10] SVN dump parser Jonathan Nieder
2010-08-12 17:22       ` Junio C Hamano
2010-08-09 22:55     ` PATCH 09/10] Update svn-fe manual Jonathan Nieder
2010-08-09 22:58     ` [PATCH 10/10] svn-fe manual: Clarify warning about deltas in dump files Jonathan Nieder
2010-08-10 12:53     ` [PATCH 0/10] rr/svn-export reroll Ramkumar Ramachandra
2010-08-11  1:53       ` Jonathan Nieder
2010-10-11  2:34       ` [PATCH/WIP 00/16] svn delta applier Jonathan Nieder
2010-10-11  2:37         ` [PATCH 01/16] vcs-svn: Eliminate global byte_buffer[] array Jonathan Nieder
2010-10-11  2:39         ` [PATCH 03/16] vcs-svn: Collect line_buffer data in a struct Jonathan Nieder
2010-10-11  2:41         ` [PATCH 04/16] vcs-svn: Teach line_buffer to handle multiple input files Jonathan Nieder
2010-10-11  2:44         ` [PATCH 05/16] vcs-svn: Make buffer_skip_bytes() report partial reads Jonathan Nieder
2010-10-11  2:46         ` [PATCH 06/16] vcs-svn: Improve support for reading large files Jonathan Nieder
2010-10-11  2:47         ` [PATCH 07/16] vcs-svn: Add binary-safe read() function Jonathan Nieder
2010-10-11  2:47         ` [PATCH 08/16] vcs-svn: Let callers peek ahead to find stream end Jonathan Nieder
2010-10-11  2:51         ` [PATCH 09/16] vcs-svn: Allow input errors to be detected early Jonathan Nieder
2010-10-11  2:52         ` [PATCH 10/16] vcs-svn: Allow character-oriented input Jonathan Nieder
2010-10-11  2:53         ` [PATCH 11/16] vcs-svn: Add code to maintain a sliding view of a file Jonathan Nieder
2010-10-11  2:55         ` [PATCH 12/16] vcs-svn: Learn to parse variable-length integers Jonathan Nieder
2010-10-11  2:58         ` [PATCH 13/16] vcs-svn: Learn to check for SVN\0 magic Jonathan Nieder
2010-10-11  2:59         ` [PATCH 14/16] compat: helper for detecting unsigned overflow Jonathan Nieder
2010-10-11  3:00         ` [PATCH 15/16] t9010 (svn-fe): Eliminate dependency on svn perl bindings Jonathan Nieder
2010-10-11  3:11         ` [PATCH 02/16] vcs-svn: Replace buffer_read_string() memory pool with a strbuf Jonathan Nieder
2010-10-11  4:01         ` [PATCH/RFC 16'/16] vcs-svn: Add svn delta parser Jonathan Nieder
2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
2010-10-13  9:19             ` [PATCH 01/11] fixup! vcs-svn: Learn to parse variable-length integers Jonathan Nieder
2010-10-13  9:21             ` [PATCH 02/11] vcs-svn: Skeleton of an svn delta parser Jonathan Nieder
2010-10-13  9:30             ` [PATCH 03/11] vcs-svn: Read the preimage while applying deltas Jonathan Nieder
2010-10-14 21:45               ` Sam Vilain
2010-10-14 23:40                 ` Jonathan Nieder
2010-10-13  9:35             ` [PATCH 04/11] vcs-svn: Read inline data from deltas Jonathan Nieder
2010-10-13  9:38             ` [PATCH 05/11] vcs-svn: Read instructions " Jonathan Nieder
2010-10-13  9:39             ` [PATCH 06/11] vcs-svn: Implement copyfrom_data delta instruction Jonathan Nieder
2010-10-13  9:41             ` [PATCH 07/11] vcs-svn: Check declared number of output bytes Jonathan Nieder
2010-10-13  9:48             ` [PATCH 08/11] vcs-svn: Reject deltas that do not consume all inline data Jonathan Nieder
2010-10-13  9:50             ` [PATCH 09/11] vcs-svn: Let deltas use data from postimage Jonathan Nieder
2010-10-13  9:53             ` [PATCH 10/11] vcs-svn: Reject deltas that read past end of preimage Jonathan Nieder
2010-10-13  9:58             ` [PATCH 11/11] vcs-svn: Allow deltas to copy from preimage Jonathan Nieder
2010-10-13 10:00             ` Jonathan Nieder
2010-10-18 17:00             ` [PATCH/RFC 0/11] Building up the delta parser Ramkumar Ramachandra
2010-10-18 17:03               ` Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.